Generating one or more XML documents from a relational database using XPath data model

ABSTRACT

A technique is provided for creating metadata for fast search of XML documents stored as column data. Data is stored in a data store connected to a computer. A main table is created having a column for storing a document, wherein the document has one or more elements or attributes. One or more side tables are created, wherein each side table stores one or more elements or attributes. Then, the side tables are used to locate data in the main table with scalable indexing mechanisms to facilitate search.  
     A technique is provided for generating one or more XML documents from a single SQL query. Data stored on a data storage device that is connected to a computer is transformed. A query that selects data in the data storage device is received. The selected data is retrieved into a work space. Then, one or more XML documents are generated to consist of the selected data.  
     A technique is provided for generating one or more XML documents from a relational database using the XPath data model. Data stored on a data storage device that is connected to a computer is transformed. Initially, a document object model tree is generated using a document access definition, which defines the mapping between an XML tree structure and relational tables. The document object model tree is traversed to obtain information to retrieve relational data. The relational data is mapped to one or more XML documents.  
     A technique is provided to store fragmented XML data into a relational database by decomposing XML documents with application specific mappings. Data stored on a data store that is connected to a computer is transformed. Initially, an XML document containing XML data is received. A document access definition that identifies one or more relational tables and columns is received. The XML data is mapped from the application DTD to the relational tables and columns using the document access definition based on the XPath data model.

PROVISIONAL APPLICATION

[0001] This application claims the benefit of U.S. ProvisionalApplication No. 60/168,659. entitled “XML DOCUMENT PROCESSING,” filed onDec. 2, 1999, by Isaac Cheng. et al., attorney's reference numberST9-99-106, which is incorporated by reference herein.

FIELD OF THE INVENTION

[0002] This invention relates in general to computer-implementeddatabase systems, and, in particular, to processing Extensible MarkupLanguage (XML) documents.

BACKGROUND OF THE INVENTION

[0003] The Internet is a collection of computer networks that exchangeinformation via Hyper Text Transfer Protocol (HTTP). The Internetcomputer network consists of many internet networks. Currently, the useof the Internet computer network for commercial and noncommercial usesis exploding. Via its networks, the Internet computer network enablesmany users in different locations to access information stored in datasources (e.g., databases) stored in different locations.

[0004] The World Wide Web (i.e., the “WWW” or the “Web”) is a hypertextinformation and communication system used on the Internet computernetwork with data communications operating according to a client/servermodel. Typically, a Web client computer will request data stored in datasources from a Web server computer, at which Web server softwareresides. The Web server software interacts with an interface connectedto, for example. a Database Management System (“DBMS”), which isconnected to the data sources. These computer programs residing at theWeb server computer will retrieve the data and transmit the data to theclient computer. The data can be any type of information, includingdatabase data, static data, HTML data, or dynamically generated data.

[0005] With the fast growing popularity of the Internet and the WorldWide Web (also known as “WWW” or the “Web”), there is also a fastgrowing demand for Web access to databases.

[0006] Databases are computerized information storage and retrievalsystems. A Relational Database Management System (RDBMS) is a databasemanagement system (DBMS) which uses relational techniques for storingand retrieving data. Relational databases are organized into physicaltables which consist of rows and columns of data. The rows are formallycalled tuples. A database will typically have many physical tables andeach physical table will typically have multiple tuples and multiplecolumns. The physical tables are typically stored on random accessstorage devices (RASED) such as magnetic or optical disk drives forsemi-permanent storage. Additionally, logical tables or “views” can begenerated based on the physical tables and provide a particular way oflooking at the database. A view arranges rot s in some order, withoutaffecting the physical organization of the database.

[0007] RDBMS software using a Structured Query Language (SQL) interfaceis shell known in the art. The SQL interface has evolved into a standardlanguage for RDBMS software and has been adopted as such by both theAmerican National Standards Institute (ANSI) and the InternationalStandards Organization (ISO).

[0008] The SQL interface allows users to formulate relational operationson the tables either interactively, in batch files, or embedded in hostlanguages, such as C and COBOL. SQL allows the user to manipulate thedata. The definitions for SQL provide that a RDBMS should respond to aparticular query with a particular set of data given a specifieddatabase content, but the technique that the RDBMS uses to actually findthe required information in the tables on the disk drives is left up tothe RDBMS. Typically, there will be more than one technique that can beused by the RDBMS to access the required data. The RDBMS will optimizethe technique used to find the data requested in a query in order tominimize the computer time used and, therefore, the cost of performingthe query.

[0009] Additionally, an index is an ordered set of references to therecords or rows in a database file or table. The index is used to accesseach record in the file using a key (i.e., one of the fields of therecord or attributes of the row). When data is to be retrieved, an indexis used to locate records. Then, the data is sorted into auser-specified order and returned to the user.

[0010] Extensible Markup Language (XML) is a new specification that isquickly gaining popularity for creating what are termed “XML documents”.XML documents comprise structured data. XML documents are being sharedbetween multiple businesses and between businesses and customers.

[0011] When XML documents are stored as column data, searching fordesired XML data can be time-consuming. Typically, a search for XML datawould require searching each XML document. This is usually called adocument scan. Thus, there is a need in the art for an improvedtechnique for searching for XML documents stored as column data.

[0012] With the longstanding use of relational databases, manybusinesses have stored their data in relational tables. In order toshare this data with businesses that are using XML documents, the datain the relational databases may be manually selected, retrieved. andstored into XML documents. This is a long, tedious task. Thus, there isa need for an improved technique of selecting, retrieving, and storingrelational data into XML documents.

[0013] In order to share relational data with other businesses that areusing XML documents, a user may manually convert the relational datainto XML documents. This is time consuming and inefficient. Thus, thereis a need for an improved technique of generating XML documents fromrelational data.

[0014] Additionally, when an XML document is received, a user may needto store the data from the XML document into a relational database.Currently, this is a time consuming processing in which a user manuallytransfers the data from the XML document to the relational database.Thus, there is a need for an improved technique of decomposing an XMLdocument and storing the decomposed data into a relational database.

SUMMARY OF THE INVENTION

[0015] To overcome the limitations in the prior art described above, andto overcome other limitations that will become apparent upon reading andunderstanding the present specification, the present invention disclosesa method, apparatus, and article of manufacture for a computerimplemented technique for processing XML documents.

[0016] In accordance with one aspect of the present invention, data isstored in a data store connected to a computer. A main table is createdhaving a column for storing a document, wherein the document has one ormore elements or attributes. One or more side tables are created,wherein each side table stores one or more elements or attributes. Then,the side tables are used to locate data in the main table.

[0017] In accordance with another aspect of the present invention, datastored on a data storage device that is connected to a computer istransformed. A query that selects data in the data storage device isreceived. The selected data is retrieved into a work space. Then, one ormore XML documents are generated to consist of the selected data.

[0018] In accordance with yet another aspect of the present invention,data stored on a data storage device that is connected to a computer istransformed. Initially. a document object model tree is generated usinga document access definition. The document object model tree istraversed to obtain information to retrieve relational data. Therelational data is mapped to one or more XML documents.

[0019] In accordance with a further aspect of the present invention,data stored on a data store that is connected to a computer istransformed. Initially an XML document containing XML data is received.A document access definition that identifies one or more relationaltables and columns is received. The XML data is mapped from theapplication DTD to the relational tables and columns using the documentaccess definition based on the XPath data model.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020] Referring now to the drawings in which like reference numbersrepresent corresponding parts throughout:

[0021]FIG. 1 schematically illustrates the hardware environment of anembodiment of the present invention, and more particularly, illustratesa typical distributed computer system using a network.

[0022]FIG. 2 is a diagram illustrating a computer software environmentthat could be used in accordance with the present invention.

[0023]FIG. 3 illustrates an application or main table and its four sidetables.

[0024]FIG. 4 is a flow diagram illustrating steps performed by the XMLSystem in creating and maintaining XML document data as column data.

[0025]FIG. 5 is a flow diagram of steps performed by the XML System toenable a column.

[0026]FIG. 6 is a flow diagram of steps performed by the XML System todisable a column.

[0027]FIG. 7 is a diagram illustrating code organization to compose XMLdocuments.

[0028]FIG. 8 is a block diagram illustrating components of the XMLSystem in one embodiment of the invention.

[0029]FIG. 9 is a flow diagram illustrating the steps performed by theXML System to transform relational data into one or more XML documentsusing SQL mapping.

[0030]FIG. 10 is a flow diagram illustrating the process performed bythe XML system using RDB_node mapping to compose XML documents.

[0031]FIG. 11 is a flow diagram illustrating the steps performed by theXML System to decompose XML documents with application specificmappings.

DETAILED DESCRIPTION

[0032] In the following description of an embodiment of the invention,reference is made to the accompanying drawings which form a part hereofand which is shown by way of illustration a specific embodiment in whichthe invention may be practiced. It is to be understood that otherembodiments may be utilized as structural changes may be made withoutdeparting from the scope of the present invention.

A. HARDWARE ARCHITECTURE

[0033]FIG. 1 schematically illustrates the hardware environment of anembodiment of the present invention, and more particularly, illustratesa typical distributed computer system using a network 100 to connectclient computers 102 executing client applications to a server computer104 executing software and other computer programs, and to connect theserver system 104 to data sources 106. A typical combination ofresources may include client computers 102 that are personal computersor workstations, and a server computer 104 that is a personal computer,workstation, minicomputer, or mainframe. These systems are coupled toone another by various networks, including LANs, WANs, SNA networks, andthe Internet. Each client computer 102 and the server computer 104additionally comprise an operating system and one or more computerprograms.

[0034] A client computer 102 typically executes a client application andis coupled to a server computer 104 executing one or more serversoftware. The server software may include an XML system 110. The servercomputer 104 also uses a data store interface and, possibly, othercomputer programs, for connecting to the data sources 106. The clientcomputer 102 is bi-directionally coupled with the server computer 104over a line or via a wireless system. In turn, the server computer 104is bi-directionally coupled with data sources 106.

[0035] The data store interface may be connected to a DatabaseManagement System (DBMS), which supports access to a data store 106 byexecuting, for example, RDBMS software. The interface and DBMS may belocated at the server computer 104 or may be located on one or moreseparate machines. The data sources 106 may be geographicallydistributed.

[0036] The operating system and computer programs are comprised ofinstructions which, when read and executed by the client and servercomputers 102 and 104, cause the client and server computers 102 and 104to perform the steps necessary to implement and/or use the presentinvention. Generally, the operating system and computer programs aretangibly embodied in and/or readable from a device, carrier, or media,such as memory, other data storage devices, and/or data communicationsdevices. Under control of the operating system. the computer programsmay be loaded from memory, other data storage devices and/or datacommunications devices into the memory of the computer for use duringactual operations.

[0037] Thus, the present invention may be implemented as a method,apparatus, or article of manufacture using standard programming and/orengineering techniques to produce software, firmware, hardware, or anycombination thereof. The term “article of manufacture” (oralternatively, “computer program product”) as used herein is intended toencompass a computer program accessible from any computer-readabledevice, carrier, or media. Of course, those skilled in the art willrecognize many modifications may be made to this configuration withoutdeparting from the scope of the present invention.

[0038] Those skilled in the art will recognize that the exemplaryenvironment illustrated in FIG. 1 is not intended to limit the presentinvention. Indeed, those skilled in the art will recognize that otheralternative hardware environments may be used without departing from thescope of the present invention.

B. XML BACKGROUND

[0039] Extensible Markup Language (XML) is a subset of StandardGeneralized Markup Language (SGML). XML works in conjunction withExtensible Stylesheet Language Transformation, (XSLT) and ExtensibleMarkup Language Path (XPath). XML may also work in conjunction with aDocument Object Model (DOM) or Namespace.

[0040] Extensible Markup Language (XML) is a subset of StandardGeneralized Markup Language (SGML). XML is described in XML 1.0, foundat the following web site: http://www.w3.org/TR/REC-xml. ExtensibleMarkup Language (XML) is a set of rules or guidelines for designing textformats for structured data using tags. Additional detail may be foundat the following web site: http://www.w3.org/XML/1999/XML-in-10-points.For interoperability, domain-specific tags called a vocabulary can bestandardized using a Document Type Definition, so that applications inthat domain understand the meaning of the tags.

[0041] Extensible Style Language Transformer or XSLT is a language fortransforming XML documents into other XML documents. The XSLTspecification defines the syntax and semantics of the XSLT language.XSLT-defined elements are distinguished by belonging to a specific XMLnamespace, which is referred to as the XSLT namespace. A transformationexpressed in XSLT describes rules for transforming a source tree into aresult tree. Further detail about XSLT may be found athttp://www.w3.org/TR/xslt.

[0042] XML Path or XPath addresses parts of an XML document. XPath getsits name from its use of a path notation as in URLs for navigatingthrough the hierarchical structure of an XML document. Further detailabout XML path may be found at http://www.w3.org/TR/xpath.

[0043] A Document Object Model (DOM) is a standard set of function callsfor manipulating XML files from a programming language. Additionaldetail may be found at the following web site:http://www.w3.org/TR/REC-DOM-Level-1/.

C. OVERVIEW OF THE XML SYSTEM

[0044] In one embodiment of the invention, the XML System comprises theXML Extender from International Business Machines, Corporation, ofArmonk, N.Y. The XML System offers the capability of XML storage anddata interchange. By storage, the XML System provides mechanisms forstoring and retrieving XML documents in a relational database (e.g.,DB2® from International Business Machines, Corporation) and searchingthe content of XML with high performance. By data interchange, the XMLSystem provides a mapping between new and existing relational tables andXML formatted documents. Thus, the XML System allows customers to doe-business anywhere, enabling XML with Business to Business (B2B) andBusiness to Consumer (B2C) applications. For B2B applications,application data flows between database servers, via any network (e.g.,the internet or an intranet), either directly without client interactionor indirectly via some client systems. For B2C applications, applicationdata flows between a consumer at, for example, a workstation, and aserver connected via a network (e.g., between database servers and webclients via the internet). Thus, the XML System supports Business toBusiness (B2B) and Business to Client (B2C) applications. In both cases,the following requirements will apply:

[0045] Performance

[0046] Scalability

[0047] Integration with existing business logic

[0048] Smart query support

[0049] Legacy file support

[0050] Developer efficiency

[0051]FIG. 2 is a diagram illustrating a computer hardware environmentthat could be used in accordance with the present invention. In oneembodiment, the DB2 XML Extender 200, a product from InternationalBusiness Machines, Corporation, is at the center of the architecture. Anapplication program 202 and a document access definition (DAD) 204 arereceived by the DB2 XML Extender 200. The DB2 XML Extender 200 takes anXML document 206 as the input, stores the XML document 206 in DB2 210(i.e., a relational database) either internally inside DB2 210 orexternally on the file system as one or more XML files 208. Then thestored XML document 206 can also be retrieved from DB2 210 or the filesystem through the DB2 Extender 200. The processing performed by the DB2XML Extender 200 will be described in more detail below.

[0052] In another embodiment, an application program 202 and a documentaccess definition (DAD) 204 are received by the DB2 XML Extender 200.The DB2 XML Extender 200 takes an XML document 206 as input, decomposesthe XML document 206 into fragmented data and stores the fragmented datain DB2 210 (i.e., a relational database). Then, the fragmented datastored in DB2 210 can be regenerated from DB2 210 through the DB2Extender 200. The processing performed by the DB2 XML Extender 200 willbe described in more detail below.

[0053] Those skilled in the art will recognize that the environmentillustrated in FIG. 2 is not intended to limit the present invention.Indeed, those skilled in the art will recognize that other alternativehardware environments may be used without departing from the scope ofthe present invention.

[0054] C.1 Applications

[0055] Different types of applications can benefit from the use of theXML System. Some illustrations follow:

[0056] Business to Business (B2B) Applications for E-Commerce:

[0057] B2B applications mainly use XML as their interchange format, suchas Electronic Data Interchange (EDI). The XML System enables maintainingnative XML formatted documents, as well as mapping data into/fromrelational tables. With native XML formatted documents XML enablesstoring entire XML documents into a database and searching on knownelements or attributes. With mapping, XML System enables an applicationbuilder who knows the relational data model of particular businesstables to custom map XML content to or from existing tables.

[0058] Web Information Retrieval Applications:

[0059] These are B2C applications which are often used in interactiveWeb sites, such as sites for insurance and real-estate industries. TheXML documents are usually not very large in size, but have structuredinformation.

[0060] The XML System enables storing entire XML documents into adatabase and using SQL to do a fast search on desired XML elements orattributes with rich data types. Range search for rich data types isoften important. Additionally the XML System enables retrieving datafrom existing business tables and from XML documents and putting them ona web site for viewing.

[0061] For example, an insurance company may set up a call center systemin which agents retrieve phone calls from their customers. Theinformation is collected, and the case is archived. The XML System isused to store entire XML documents in a database. Then, an insuranceagent can easily display an insurance case on a screen. The XML Systemalso provides a fast and powerful search of these insurance cases, sothe insurance agent can quickly retrieve information while still on thephone with a customer. Additionally, alternative ways of searching forinformation, i.e. numbers, text wildcards, key words, etc., are providedby XML System.

[0062] Content Management:

[0063] This type of application provides advanced content managementfunctions to a user. A user could use XML System as physical storage,and have fast search with indexing. The XML documents are usually largein size. In some cases, it is desirable to partition the XML documentsinto multiple pieces and perform update in place.

[0064] As an extender to DB2®, XML System enhances DB2® functionalityfor XML enablement. That is, XML System enables use of SQL as the mainaccess technique, along with database features of: stored procedures,user defined types (UDT) and user defined functions (UDF).

[0065] The XML System meets the following requirements:

[0066] Physical Storage: for entire document, or shredded structureddata, with data types.

[0067] Support of flat files: allows data to be stored in flat files,and imported/exported to/from database.

[0068] Access via SQL: supports field search, full text search,structural search.

[0069] Indexing facility: builds on different data types for betterquery performance.

[0070] Updates XML element/attribute.

[0071] Mapping to/from relational: composes/decomposes XML documentsfrom/to data store in relational tables.

[0072] NLV support: support of double byte characters.

[0073] C.2 XColumns and XCollections

[0074] XML System provides good data and metadata management solutionsto handle traditional and non-traditional data. With the content ofstructured XML documents in a database, a user can combine structuredXML information with traditional relational data. Based on theapplication, a user can choose whether to store entire XML documents ina database as a non-traditional distinct data type or map the XMLcontent as traditional data in relational tables. For non-traditionalXML data types, the XML System adds the power to search rich data typesof XML element or attribute values. For traditional SQL data, that iseither decomposed from incoming XML documents or in existing relationaltables to be used to create outgoing XML documents, the XML Systemprovides a custom mapping mechanism to allow the transformation betweenXML documents and relational data.

[0075] The XML System offers the flexibility to store entire XMLdocuments as column data or transform between XML documents and data inexisting tables. The transformation includes decomposing an XML documentinto one or multiple pieces and storing the pieces in the form ofrelational data, as well as, composing XML documents from the data inexisting relational tables. A user can decide how structured XMLdocuments are to be stored or created through a Document AccessDefinition (DAD).

[0076] The DAD itself is an XML formatted document. The DAD associatesXML documents to a database through two major access and storagetechniques by defining elements Xcolumn and Xcollection. Xcolumn defineshow to store and retrieve entire XML documents as column data of the XMLuser defined type (UDT). An XML column is a column of XML System's userdefined type (UDT). Applications can include an XML column in any usertable. Operations on the XML column can be processed after the column isenabled with the XML System. A user can access XML column data mainlythrough the SQL statements and XML System's user defined function (UDF).With the different access and storage techniques, the XML Systemprovides the flexibility of XML data storage and retrieval.

[0077] In particular, an XML column is used to store entire XMLdocuments in the native XML format. This approach treats XML format asan non-traditional data type and offers user defined types (UDTs) anduser defined functions (IDFs) for a fast, versatile, and intelligenttechnique for searching through XML documents. The XML System givesapplications the freedom to specify a list of XML elements/attributes asgeneral SQL data types for fast search. The XML System will extractthese values from the XML documents and store them in side tables sothat a user can create indices on them. The application can query theseside tables or join them with the application (i.e., “main”) table to doa fast search. For example, a user can input a query such as: “give meall the documents whose prices are greater than $2500.00”, providing2500.00 is the value of an XML element or attribute inside the XMLdocuments.

[0078] The XML System provides several user defined types (UDTs) for XMLcolumns. These data types are used to identify the storage types of XMLdocuments in the application table. The XML System supports legacy flatfiles, and a user is not restricted to storing XML documents inside adatabase. A user can also store XML documents as files on the local orremote file system, specified by a URL or a local file name.

[0079] The XML System provides powerful user-defined function (UDF)s tostore and retrieve XML documents in XML columns, as well as to extractXML element/attribute values. The UDFs are applied to XML user definedtypes (UDTs), thus, these are mainly used for XML columns.

[0080] An Xcollection defines how to decompose XML documents into acollection of relational tables or to compose XML documents from acollection of relational tables An XML collection is a virtual name of aset of relational tables. Applications can enable an XML collection ofany user tables. These user tables can be existing tables of legacybusiness data or the ones newly created by the XML System. A user canaccess XML collection data mainly through the stored procedures providedby the XML System.

[0081] An XML collection is used to transform data between databasetables and XML documents. An XML collection achieves the goal of datainterchange via XML. For applications that want to compose or decomposeXML documents from/into a set of relational tables, the XML Systemoffers a technique to enable an XML collection through a Document AccessDefinition (DAD). In the Document Access Definition, applications canmake a custom mapping between database column data in new or existingtables to XML elements or attributes. The access to an XML collection isby calling XML System's stored procedures or directly querying to thetables of the collection.

[0082] The XML System also allows overrides of query conditionsexplicity or implicitly defined in the DAD, by parsing the SQL or XMLXPath based override parameter to the composition stored procedures. Inthis way, it supports dynamic query for generating XML documents.

[0083] With the XML System, an application can:

[0084] Store entire XML documents as column data in an applicationtable, either internally or externally as a local file or URL, whileextracting desired XML element or attribute values into side tables forsearch.

[0085] Compose or decompose contents of XML documents from/into an XMLcollection which consists of one or more relational tables.

[0086] Perform fast search on XML elements or attributes of SQL generaldata types by converting character string in XML documents to SQL datatypes for indexing.

[0087] Update the content of an XML element, or the value of an XMLattribute.

[0088] Extract XML elements or attributes dynamically in SQL query.

[0089] Validate XML documents during insertion and update.

[0090] The XML System also serves as an XML document type definition(DTD) repository. When a database is XML enabled, a DTD Reference Table(DTD_REF) is created. Each rots of this table represents a DTD, withadditional metadata information. This table is accessible by users, andallows them to insert their own DTDs. The DTDs in the DTD_REF table areused to validate XML documents and to help applications to define adocument access definition (DAD).

[0091] C.3 Terminology

[0092] This section clarifies some terminology used in thisspecification.

[0093] The XML System uses a subset of Extensive Stylesheet LanguageTransformation (XSLT) and XML Path Language (XPath), Version 1.0, theW3C working draft of 06/17/1999, to identify XML elements or attributes.The content of the XPath is originally in the XSLT and now is referredby the XSLT, as a part of the stylesheet transformation language.Location path is used to define XML elements and attributes. TheXSLT/XPath's abbreviated syntax of the absolute location path is used.

[0094] The following is not a formal data model, but a set ofabbreviated syntax. The notation of the absolute location path withabbreviated syntax supported by the XML System is listed below.

[0095] This section clarifies some terminology used in thisspecification.

[0096] The XML System uses a subset of Extensive Stylesheet LanguageTransformation (XSLT) and XML Path Language (XPath), Version 1.0, theW3C working draft of 06/17/1999, to identify XML elements or attributes.The content of the XPath is originally in the XSLT and now it isreferred to by XSLT as a part of the stylesheet transformation language.Previously, the term “path expression” was used. Now, a subset of theterm location path is used in XSLT and XPath to define XML elements andattributes. The XSLT XPath's abbreviated syntax of the absolute locationpath is used.

[0097] The following is not a formal data model. but a set ofabbreviated syntax. An absolute location path with abbreviated syntax islisted below. This is supported by the XML System. Again, these are notformal definitions.

[0098] a. “/”:

[0099] Represents the XML root element.

[0100] b. “/tag1”:

[0101] Represents the element tag1 under root.

[0102] c. “/tag1/tag2/ . . . / tagn”:

[0103] Represents an element with the name tagn as the child with thedescending chain from root, tag1, tag2, . . . , tagn-1

[0104] d. “//tagn”

[0105] Represents any element with the name tagn, where “//” denoteszero or more arbitrary tags.

[0106] e. “/tag1//tagn”

[0107] Represents any element with the name tagn which is a child ofelement with the name tag1 under root, where “//” denotes zero or morearbitrary tags.

[0108] f. “/tag1/tag2/@attr1”

[0109] Represents the attribute attr1 of element with the name tag2 as achild of element tag1 under root.

[0110] g. “/tag1 /tag2/[@attr1=“5”]”

[0111] Represents the element with the name tag2 whose attribute attr1has the value ‘5’ and it is a child of element with the name tag1 underroot.

[0112] h. “/tag1/tag2/[@attr1=“5”]/ . . . /tagn”

[0113] Represents the element with the name tagn which is a child of thedescending chain from root, tag1, tag2, . . . where the attribute attr1of tag2 has the value ‘5’.

[0114] i. “/tag1/tag2/tag3=“Los Angeles”/ . . . /tagn”

[0115] Represents the element with the name tagn which is a child of thedescending chain from root, tag1, tag2 . . .where tag3 has the value“Los Angeles”.

[0116] j. “/tag1/tag2/*[@attr1=“5”

[0117] Represents all elements as children of element “/tag1/tag2” withattr1 of value “5”.

[0118] There are restrictions on the location path when used by the XMLSystem, and these are listed in the table below. TABLE Restriction ofLocation Path Supported Use of the Location Path Location Path SupportedExtracting UDFs a-j Text Extender's search UDF a-j DAD column definitionc, f (simple location path)

[0119] Note that there is a restriction in the DAD column definitionbecause there is a one-to-one mapping between an element or attribute toa column.

[0120] The term simple location path refers to the c and f notations inthe table for Restriction of Location Path Supported. The simplelocation path is a sequence of element type names connected by the “/”notation. Each element type may be qualified by its attribute values.TABLE Simple Location Path of an Element and an Attribute Subjectlocation path Description XML /tag_1/tag_2/.../tag_n-1/tag_n an elementcontent Element identified by the tag_n and its parents XML/tag_1/tag_2/.../tag_n-1/tag_n/@attr1 an attribute with Attribute name“attr1” of the element identified by tag_n and it parents

[0121] The location path identifies the structure part that indicatesthe document context to be found. An empty path signals the structure tosearch or extract against is the whole document (same effect as if thelocation path is the root element).

[0122] The XML System provides users the ability to create SQL querieson XML documents. Based on the nature of XML documents and thefunctionality of the XML System. the following terminology is used:Document Access The definition used to enable an XML Definition(DAD):System column or an XML collection, which is XML formatted. Partition:The term partition used means the full partition. In other words, theunion of all partitioned parts forms the original document. Locationpath: A subset of the abbreviated syntax of the location path defined byXPath. A sequence of XML tags to identify an XML element or attribute.It is used in the extracting UDFs to identify the subject to beextracted. The terms of path expression and location path may be usedinterchangeably. Side table: Additional tables created by the XML Systemto store searchable elements/attributes for an enabled XML Column. ValidDocument: An XML document that has an associated DTD. To be valid, theXML document cannot violate the syntactic rules specified in its DTD.Well-formed document: An XML document that does not contain a DTD. Adocument with a DTD (valid) must also be well-formed. XML Attribute: Anyattribute specified by the ATTLIST under the XML element in the DTD. TheXML System uses the location path to identify an attribute. XML Column:A column in the application table of the XML System UDT type. The termof XML enabled column and XML column will be used interchangeably. XMLCollection: A collection of relational tables which present the data tocompose XML documents or to be decomposed from XML documents. XMLElement: Any XML tag or ELEMENT as specified in the XML DTD. The XMLSystem uses the location path to identify an element. XML Tag: Any validXML markup language tag, mainly the XML element. The term tag andelement are used interchangeably. XML Table: An XML Table is anapplication table which includes XML System column(s). The terms XMLenabled table and XML table are used interchangeably. XML Object: Theterms XML Object and XML document are used interchangeably. XML UDT:User defined type provided by the XML System. XML UDF: User definedfunction provided by the XML System.

[0123] C.4 Example of an XML DTD

[0124] The following DTD is provided as an example:

[0125] LineItem.dtd <?xml encoding=“US-ASCII”?> <!ELEMENT Order(customer,Part−)> <!ATTLIST Order Key CDATA #REQUIRED> <!ELEMENTCustomer #PCDATA> <!ELEMENT Part(Quantity,ExtendedPrice,Tax,Shipment*)><!ATTLIST Part Key CDATA> <!ELEMENT Quantity (#PCDATA)> <!ELEMENTExtendedPrice (#PCDATA)> <!ELEMENT Tax (#PCDATA)> <!ELEMENT Shipment(ShipDate,ShipMode,Comment)> <!ELEMENT ShipDate (#PCDATA)> <!ELEMENTShipMode (#PCDATA)> <!ELEMENT Comment (#PCDATA)>

[0126] In the above LineItem.dtd, the term LineItem.dtd is the title ofthe Document Type Definition. The term <?xml encoding=“US-ASCII”?>indicates that encodin; is in US-ASCII. The terms beginning with ELEMENTrefer to elements of an XML document, and the terms beginning withATTLIST refer to attributes of an XML document. The DTD is used toverify a Document Access Definition.

[0127] C.5 Example of an XML Document

[0128] The following is an example of an XML formatted document:

[0129] order.xml <?xml version=“1.0”?> <!DOCTYPE Litem_DTD SYSTEM“E:\dxx\test\dtd\LineItem.dtd”> <Order Key=“1”> <Customer>GeneralMotor</Customer> <Part Key=“156”> <Quantity>17</Quantity><ExtendedPrice>17954.55</ExtendedPrice> <Tax>0.02</Tax> <Shipment><ShipDate>1998-03-13</ShipDate> <ShipMode>TRUCK</ShipMode> <Comment>Thisis the first shipment to service of GM</Comment> </Shipment> <Shipment><ShipDate>1999-01-16</ShipDate> <ShipMode>FEDEX</ShipMode> <Comment>Thisthe second shipment to service of GM.</Comment> </Shipment> </Part><Part Key=“68”> <Quantity>36</Quantity><ExtendedPrice>34850.16</ExtendedPrice> <Tax>0.06</Tax> <Shipment><ShipDate>1996-04-12</ShipDate> <ShipMode>BOAT</ShipMode> <Comment>Thisshipment is requested by a call, from GM marketing. <Comment></Shipment> <shipment> <ShipDate>1998-08-19</ShipDate><ShipMode>AIR</ShipMode> <Comment>This shipment is ordered by anemail.<Comment> <Shipment> </Part> <Order>

[0130] In the above XML document, the term order.xml is the title of theXML document. The term <?xml version=“1.0”?> indicates that thisdocument is based on XML Version 1.0. The term <!DOCTYPE Litem_DTDSYSTEM “E:\dxx\test\dtd\LineItem.dtd”> is text for the XML document typedefinition and references the example Document Type Definition, entitledLineItem.dtd, in C.4, which is used for validation.

[0131] The remaining terms define the data in the XML document. Forexample, the term <Quantity>17<Quantity> indicates that quantity has avalue of 17. Also, note that <Quantity> without a slash at the beginningdefines a start tag and <Quantity> with a slash at the beginning definesan end tag. Similarly, other terms in the XML document use such tags.

[0132] C.6 The Document Access Definition (DAD)

[0133] A user decides how XML document data is to be accessed in adatabase. That is, the 150

[0134] user defines a DAD. With the help of a Graphical User Interface(GUI) tool, the user can create a DAD to define a mapping and indexingscheme.

[0135] A Document Access Definition(DAD) is defined by the followingDocument Type Definition (DTD):

[0136] dad.dtd <?xml encoding=“US-ASCII”?> <!ELEMENT DAD (dtdid?,validation, (Xcolumn | Xcollection))> <!ELEMENT dtdid (#PCDATA)><!ELEMENT validation (#PCDATA)> <!ELEMENT Xcolumn (table*)> <!ELEMENTtable (column*)> <!ATTLIST table name CDATA #REQUIRED key CDATA #IMPLIEDorderBy CDATA #IMPLIED> <!ELEMENT column EMPTY> <!ATTLIST column nameCDATA #REQUIRED type CDATA #IMPLIED path CDATA #IMPLIED multi_occurrenceCDATA #IMPLIED> <!ELEMENT Xcollection (SQL_stmt*, prolog, doctype,root_node)> <!ELEMENT SQL_stint (#PCDATA)> <!ELEMENT prolog (#PCDATA)><!ELEMENT doctype (#PCDATA | RDB_node)”> <!ELEMENT root_node(element_node)> <!ELEMENT element_node (RDB_node?, attribute_node*,text_node?, element_node*, namespace_node*, process_instruction_node*,comment_node*)> <!ATTLIST element_node name CDATA #REQUIRED ID CDATA#IMPLIED multi_occurrence CDATA “NO” BASE_URI CDATA #IMPLIED> <!ELEMENTattribute_node (column | RDB_node)> <!ATTLIST attribute_node name CDATA#REQUIRED> <!ELEMENT text_node (column | RDB_node)> <!ELEMENT RDB_node(table+, column?, condition?)> <!ELEMENT condition (#PCDATA)> <!ELEMENTcomment_node (#PCDATA)> <!ELEMENT namespace_node EMPTY> <!ATTLISTnamespace_node name CDATA #IMPLIED value  CDATA #IMPLIED> <!ELEMENTprocess_instruction_node (#PCDATA)>

[0137] The XML System Administration GUI will provide an interface tocreate DAD files. The DAD itself is a tree structured XML document. Theimportant elements and attributes of the DAD are:

[0138] DTDID:

[0139] The identifier of the DTD stored in the dtd_ref table. Itrepresents the DTD which validates the XML documents or guides themapping between XML collection tables and XML documents. DTDID must bespecified for XML collections. For XML columns, it is optional and isonly needed if you want to create side tables for indexing onelements/attributes or validate nets XML documents. The DTDID must bethe same as the SYSTEM ID specified in the “doctype” of the XMLdocuments.

[0140] Validation

[0141] For validating XML documents with the DTD, and “No” for novalidation. If “Yes”, then the DTDID must also be specified.

[0142] Xcolumn

[0143] An Xcolumn defines the indexing scheme for an XML Column. It iscomposed by zero or more tables.

[0144] table:

[0145] The relational side table(s) created for indexing elements orattributes of documents stored in an XML column. You can have one ormore tables. A table is specified by:

[0146] name:

[0147] name of the side table.

[0148] column:

[0149] The column of the side table, which contains the value of alocation path of the specified type.

[0150] name: name of the column. It is the alias name of the locationpath which identifies an element or attribute,

[0151] type: the data type of the column. It can be any SQL data type.

[0152] path: the location path of an XML element or attribute. Only asimple location path defined in Section C.3 is allowed here.

[0153] multi_occurrence: “YES” or “NO” to specify whether this elementor attribute will have in one XML document. For Xcolumn, ifmulti_occurrence is specified as “YES”, the XML System will add anothercolumn “DXX_SEQNO” with type Integer in the side table which this columnbelong to. This DXX_SEQNO keeps track of the order of elements occurredfor the path expression in each inserted XML documents. With DXX_SEQNO,the user can retrieve a list of the elements with the same order as theoriginal XML document using “ORDER BY DXX_SEQNO” in SQL.

[0154] Xcollection

[0155] The Xcollection defines mapping between XML documents and an XMLcollection of relational tables. It is composed by the followingelements:

[0156] SQL stmt

[0157] The SQL statement to specify the operation needed to achieve themapping. It must be a valid SQL statement. It is only needed forcomposition, and only one SQL_stmt of query is allowed.

[0158] objids

[0159] A list of identifiers, each of which conceptually identifies arow object in the database table, so that the row to be selected isordered by this unique value. It is only needed when SQL_stmt issupplied. The ID can be a column name, or a value generated from thegenerate_unique( ) function or a UDF. It is recommended but notnecessary to be the primary key of the table.

[0160] prolog

[0161] Text for the XML prolog. The same prolog is supplied to alldocuments in the entire collection. It is a fixed text. This becauseonly XML 1.0 is supported. and UDB® only supports UTF-8.

[0162] doctype

[0163] Text for the XML document type definition. The doctype can bespecified in one of the following two ways:

[0164] The same doctype is supplied to all documents in the entirecollection. In this case, it is a fixed text.

[0165] When decomposing, the doctype can be stored as a column data of atable. In this case, the RDB_node should be specified.

[0166] root node

[0167] The virtual root node which must has one and only oneelement_node. The element_node under the root node is actually theroot_node of the XML document.

[0168] RDB_node:

[0169] The node defines the mapping between an XML element or attributeand relational data. It consists of:

[0170] table:

[0171] name: the name of a relational table in which the data of an XMLelement or attribute reside.

[0172] key: the primary single key of the table. It must be specifiedfor decomposition.

[0173] For the root element_node. all tables storing its attribute orall child element data should be specified.

[0174] orderBy: names of columns that determine the sequence order ofmultiple-occurring element text or attribute value when generating XMLdocuments.

[0175] column:

[0176] It must be specified for text_node or attribute_node. but not forthe element_node.

[0177] name: name of the column which contains the value of an XMLelement or attribute. It must be specified for both composition anddecomposition.

[0178] type: the data type of the column. It is needed only fordecomposition.

[0179] path: the location path of the element or attribute. It is notneeded for Xcollection, only for Xcolumn.

[0180] multi_occurrence: multiple occurrence of the element orattribute.

[0181] condition: the predicate to specify query condition. It servestwo purposes:

[0182] In RDB_node of a text_node or attribute_node: if specified, itqualifies the condition to select the column data to be used to composeor decompose XML element text or attribute value. It is optional.

[0183] In RDB_node of the root element node: if more than one tables aresupplied, it must be specified as the condition to join tables.

[0184] element_node

[0185] Representing an XML element. It must be defined in the specifiedDTD. For the RDB_node mapping, the root element_node must have aRDB_node to specify all tables containing XML data for itself and allits children nodes. It can have zero or more attribute_nodes and childelement_nodes, as well as zero or one text_node. In the next release, anelement_ode can also contain namespace_nodes, process_instruction_nodesand comment_node.

[0186] An element_node is defined by:

[0187] Attributes:

[0188] name: The name of the XML element. It is the tag name.

[0189] ID: The unique ID. This is adapted from XPTH.

[0190] BASE_URI: The base URI for the name space. This is also adaptedfrom XPTH.

[0191] Optional RDB_node:

[0192] The RDB_node is only needed for the root element_node when usingRDB_node mapping. In this case, all tables involved to generate ordecompose XML documents must be specified. The column is not needed. Thecondition must be specified to show the join relationship among tables.

[0193] Optional child nodes:

[0194] An element_node can also have the following child nodes:

[0195] element_node(s): representing child element(s) of this element,

[0196] attribute_node(s): representing attribute(s) of this element;

[0197] text node: represent the CDATA text of this element,

[0198] comment_node: representing the comment for this element,

[0199] namespace node: representing the namespace of this element,

[0200] process_instruction_node: representing the process instruction,

[0201] attribute node:

[0202] Representing an XML attribute. It is the node defining themapping between an XML attribute and the column data in a relationaltable. It must has a name, and a column or a RDB_node.

[0203] Attribute:

[0204] name: the name of the attribute. It must be defined in the DTD.

[0205] Column or RDB_node:

[0206] Column: needed for the SQL mapping. In this case, the column mustbe in the SQL_stmt's SELECT clause.

[0207] RDB_node: needed for the RDB_node mapping. The node defines themapping between this attribute and the column data in the relationaltable. The table and column must be specified. The condition isoptional.

[0208] text_node:

[0209] Representing the text content of an XML element. It is the nodedefining the mapping between an XML element content and the column datain a relational table. It must be defined by a column or a RDB_node.

[0210] Column: needed for the SQL mapping. In this case, the column mustbe in the SQL_stmt's SELECT clause.

[0211] RDB_node: needed for the RDB_node mapping. The node defines themapping between this text content and the column data in the relationaltable. The table and column must be specified. The condition isoptional.

D. CREATING METADATA FOR FAST SEARCH OF XML DOCUMENTS STORED AS COLUMNDATA

[0212] One embodiment of the invention provides an XML System whichsolves the problem of fast searching and indexing of XMLelement/attribute values of XML documents when they are stored inside adatabase as column data.

[0213] An XML document is a structured document. XML lets a userstructure a document by elements or attributes (e.g., title or author).Once a document is structured in this manner, a structured search man beperformed based on element or attribute values (or content).

[0214] The embodiment of the invention converts the characters ofelement/attribute values to any general SQL data type. Additionally, theembodiment of the invention provides a technique for performing a rangesearch on the data. That means the element or attribute values areconverted to SQL types (e.g., number of pages may be an integer). Withthis embodiment of the invention, indices can be created on XMLelement/attribute values, thus the search operation is scalable.

[0215] The embodiment of the invention permits application programmersto define a Data Access Definition (DAD) which identifies the XMLelements or attributes that need to be indexed and defines the mappingbetween XML elements or attributes to columns in one or more sidetables. The DAD is an XML formatted document that is used to specifywithin an XML document which elements or attributes are to be searched.The DAD also provides a location path or XPath. For example, if elementsof a book are structured as follows:

[0216] |-----Book

[0217] |-----Title

[0218] |-----Author

[0219] The location path for the above structure would be:/Book/Title/Author.

[0220] Additionally, the embodiment of the invention stores XML documentdata in an application table, while storing particular elements orattributes in side tables. The data stored in the side tables isreferred to as “metadata” and is used to search for elements orattributes in the XML documents stored as column data in the applicationtable. During the enabling of a column which contains XML documents,side tables are created (based on the DAD) to store duplicate data ofthese elements or attributes. Several triggers are created so thatvalues of these elements or attributes are extracted when operations areperformed on XML documents in columns of an application table. Theoperations include, for example, insert operations on the applicationtable, which trigger insert operations to also store the inserted XMLdata into the side tables. Triggers also manage the synchronization ofXML data between the side table data during the deleting and updatingoperations on the column containing the XML documents in the applicationtable.

[0221] D.1 Indexing for Searching XML Columns

[0222] The indexing mechanism is applied on XML columns. In particular,the indexing mechanism discussed here is a technique to create an indexon XML element or attribute values when entire XML documents are storedin XML columns.

[0223] With a large collection of XML documents, search performance is acritical user requirement. Index support provides fast query performanceat the cost of slower update performance due to index updates. The XMLSystem provides an indexing mechanism that allows search predicates atquery-time to be evaluated through indices, without reading documentsources.

[0224] The XML column indexing mechanism allows frequently queried dataof general data types, such as integer, decimal, or date, to be indexedusing the native database index supports from the database engine. Thisis achieved by extracting the values of XML elements or attributes fromXML documents, storing them in the side tables. then allowingapplication programmers to create indices on these side tables. In aDAD, a user can define Xcolumns by specifying each column of a sidetable with a location path that identifies an XML element or attributeand a desired SQL data type. The XML System then will populate theseside tables when data is inserted into the application table. Anapplication can create an index on these columns for fast search, usingthe database B-tree indexing technology. The technique and options forcreating an index may vary across platforms. Application programmershave the freedom to create a desired index as they usually do with adatabase on their platform.

[0225] For elements/attributes in an XML document which occur multipletimes, a separate table is created for each XML element/attribute withmultiple occurrences, due to the complex structure of XML documents.

[0226] For example, a user may want to create an index on‘/Order/Part/ExtendedPrice’, and specify ‘/Order/Part/ExtendedPrice’ tobe of data type REAL. In this case, XML System will store the value of‘/Order/Part/ExtendedPrice’ in the specified column ‘price’ in a sidetable. Multiple indices on an XML column are allowed. In the example, auser can create two columns in two side tables, one for ‘ExtendedPrice’and one for “ShipDate”.

[0227] When side tables are created, they are tied together with themain (or application) table through the notion of root_id. A user candecide whether the primary key of the application table is to be the“root_id”. If the primary key does not exist in the application table,or for some reason a user doesn't want to use the primary key, then XMLSystem will alter application table to add a column DXXROOT_ID forstoring a unique identifier created at insertion time (i.e., when datais inserted into the application or main table). All side tables willhave a “DXXROOT_ID” column and have the unique identifiers stored. Ifthe primary key is used as the root_id, then all side tables will have acolumn with the same name and type as the primary key column in theapplication table, and the values of the primary keys are stored.

[0228] D.2 Sample DAD for an XML Column

[0229] Assuming the XML documents need to be stored are like the oneshown in C.5. Example of an XML Document, the following example DAD willstore the XML documents in an XML column and create several side tablesfor indexing.

[0230] Litem_DAD1.dad Litem_DAD1.dad  <?xml version=“1.0”?> <!DOCTYPEOrder SYSTEM “E:\dtd\dxxdad.dtd”> <DAD><dtdid>E:\dtd\lineItem.dtd</dtdid> <validation>YES</validation><Xcolumn> <table name=“order_tab“> <column name=“order_key”type=“integer” path=“/Order/@Key” multi_occurrence=“NO”/> <columnname=“customer” type=“varchar(50)” path=“/Order/Customer”multi_occurrence=“NO”/> </table> <table name=“part_tab”> <columnname=“part_key” type=“integer” path=“/Order/Part/@Key”multi_occurrence=“YES”/> </table> <table name=“price tab”> <columnname=“price” type=“double” path=“/Order/Part/ExtendedPrice”multi_occurrence=“YES”/> </table> <table name=“ship_tab”> <columnname=“date” type=“date” path=“/Order/Part/Shipment/ShipDate”multi_occurrence=“YES”/> </table> </Xcolumn> </DAD>

[0231] In the above DAD, Litem_DAD1.dad is the name of the DAD. Thephrase <?.xml version=“1.0”?> identifies the version, and the phrase<!DOCTYPE Order SYSTEM “E:\dtd\dxxdad.dtd”> is text for the XML documenttype definition. The first DAD and the second DAD tags indicate that theinformation between these tags comprise the data access definition. Thephrase <dtdid>E:\dtd\lineItem.dtd</dtdid> identifies the document typedefinition (DTD) to be used. The phrase <validation>YES</validation>indicates that this DAD is to be validated against the DTD. The fourtable name terms identify the four side tables to be created.

[0232] In this example, the four side tables created for indexing are asfollows:

[0233] order_tab: with columns of order_key and customer; representingattribute “/Order/@Key” and element “/Order/Customer”.

[0234] part_tab: with column of part_key, representing attribute“/Order/Part/@Key”.

[0235] price_tab: with column of price, representing element“/Order/Part/Price”

[0236] ship_tab: with column of date, representing element“/Order/Part/Shipment/ShipDate”.

[0237] For this example, it is assumed that the columns in the tablesare the elements and attributes which need to be searched frequently.

[0238]FIG. 3 illustrates an application or main table and its four sidetables. The Application table 300 has a root_id in common with each sidetable 302, 304, 306, and 308. The side tables 302, 304, 306, and 308correspond to the side tables defined in the DAD above.

[0239] D.3 XML Column/User Defined Types

[0240] An XML column is designed to store XML documents in their nativeformat in the database as column data. After a database is enabled, thefollowing user defined types (UDTs) are created:

[0241] XMLCLOB: XML document content stored as a CLOB inside thedatabase,

[0242] XMLVarchar: XML document content stored as a VARCHAR inside thedatabase,

[0243] XMLDBCLOB: XML document content stored as double byte CLOB insidethe database,

[0244] XMLFile: XML document stored in a file on a local file system

[0245] XMLURL: XML document stored as a uniform resource locator (URL)via Data Link.

[0246] A user can use these UDTs as the data type of an XML column. AnXML column is created when a user creates or alters an applicationtable.

[0247] D.4 Creating an XML Table

[0248] An XML table is a table that includes one or more columns createdwith the XML System UDT. To create such a table, an XML column isincluded in the column clause of the CREATE TABLE statement.

[0249] Consider a line item order book keeping application. The XMLformatted line item order is to be stored in a column called “order” ofan application table called “sales_tab”. The sales_tab table alsoincludes other columns of invoice_number and sales_person. Since theorder is not very long, a user may decide to store it in the XMLVarchartype. The user may also decide to let the invoice_number be the primarykey. The following create table statement can be used, where XMLVarcharis the XML System UDT: CREATE TABLE sales_tab (invoice_number char(6)NOT NULL PRIMARY KEY, sales_person varchar(20), order XMLVarchar);

[0250] D.5 Defining Xcolumn in DAD

[0251] In order to use an XML column, a DAD needs to be prepared andenabled. In DAD preparation, a user first needs to define an “Xcolumn”.The following steps guide a user to define an ‘Xcolumn”, using theexamples: XML document order.xml in C.5, DTD LineItem.dtd in C.4, andDAD Litem_DAD1.dad in D.2.

[0252] Identify the XML elements and attributes which will be frequentlysearched in the application.

[0253] In the above examples, the “/Order/@Key”, “/Order/Customer”,“/Order/Part/@Key”, “/Order/Part/ExtendedPrice”.“/Order/Part/Shipment/ShipDate” are mostly like to be searched and rangesearch is needed for some of them.

[0254] Decide how many side tables will be created for indexing. This isbased on the understanding of the DTD and XML documents.

[0255] In the above examples, since “/Order” has unique attribute “Key”and only one element “Customer”, they are put in the same side table“order_tab”. One “Order” can have one or more “Part” items (see DTDdefinition in C.4), and each “Part” will have unique attribute “Key” andelement “ExtendedPrice”, and so these are separates into two tables:“part_tab” and “price_tab”. Now, since one “Part” can have multiple“Shipment” items and each “Shipment” has one “ShipDate”, the “ShipDate”is put into another table “ship tab”.

[0256] Define the column of each side table by specifying the columnname, the matching XML element or attribute by location path, and thedata type.

[0257] In the examples, the ability to perform range search is desiredon “ExtendedPrice” and “ShipDate”, thus the data type is specified to bedouble and date respectively. Because there will be multiple occurrencesof the /Order/Part/@key, /Order/Part/ExtendedPrice and/Order/Part/Shipment/ShipDate, specify the multi_occurrence=“YES” forthese elements or attributes. By doing so, the XML System will create anadditional column DXX_SEQNO for side table price_tab and ship_tab sothat a query can be performed using “order by DXX_SEQNO” to get theelement or attribute with the same order as that in the original XMLdocuments.

[0258] D.6 Enabling Parameters

[0259] A column can be enabled through the XML System administration GUIor using, a dxxadm command with the enable_column option. The syntax ofthe option is as follows s:

[0260] dxxadm enable_column db_name tab_name column_name DAD_file

[0261] -t tablespace -v default_view -r root_id]

[0262] where:

[0263] db_name: the database name

[0264] tab_name: table name in which the XML column resides.

[0265] column_name: name of the XML column.

[0266] DAD_file: name of the file that contains Data Access Definition(DAD).

[0267] tablespace: optional, but if specified, a previously createdtablespace which will contain side tables created by the XML System.

[0268] default_view: optional, but if specified, it is the name of thedefault view created by XML System to join application table and allside tables.

[0269] root_id: optional, but recommended, and if specified, it is thecolumn name of the primary key in the application table, and XML Systemwill use it as the unique “root_id” to tie all side tables with theapplication table. If not specified, XML System will add the column ofDXXROOT_ID in the application table. Note: if the application tablehappened to have a column name as “DXXROOT_ID”, the primary key must bespecified as the “root_id”. otherwise, an error will be returned.

[0270] Here is an example for enabling the column order in the tablesales_tab in database mydb with the DAD_file Litem_DAD1.dad in C.4,default view sales_order_view and root_id invoice_number.

[0271] /home/u1>dxxadm enable_column mydb sales_tab order Litem_DAD1.dad

[0272] -v sales_order_view -r invoice_number

[0273] DXXA007I XML Extender is enabling column order. Please wait.

[0274] DXXA008I XML Extender has successfully enabled the column order.

[0275] /home/u1>

[0276] D.7 Results of the Column Enabling

[0277] The enabling of an XML column mainly does the following things toa database:

[0278] Read the DAD_file and do the following:

[0279] if DTDID is specified, retrieve the DTD from the dtd_ref table.

[0280] process Xcolumn to create side tables

[0281] create triggers for insert, update and delete on the XML columnso that the side tables will be populated or updated.

[0282] Create a default_view if specified.

[0283] If root_id not specified. alter application table to addDXXROOT_ID column.

[0284] Update the XML_USAGE and dtd_ref table to reflect the enabling ofthis XML column.

[0285] Based on the above examples, the user table sales_tab has thefollowing schema:

[0286] Based on the above examples, the user table sales_tab has thefollowing schema:

[0287] User table sales_tab: Column Name invoice_number sales_personorder Data Type char(6) varchar(20) XMLVarchar

[0288] The enabling column operation will create the following sidetables based on the DAD: Side_table order_tab: Column Name order_keycustomer invoice_number Data Type integer varchar(50) char(6) LocationPath /Order/@Key /Order/Customer N/A Side table part_tab: Column Namepart_key invoice_number Data Type integer char(6) Location Path/Order/Part/@Key N/A Side table price_tab: Column Name priceinvoice_number Data Type double char(6) Location Path/Order/Part/ExtendedPrice N/A Side table ship_tab: Column Name dateinvoice-number Data Type date char(6) Location Path/Order/Part/Shipment/ShipDate N/A

[0289] Note that because the root_id is specified by the primary keyinvoice_number in the application table sales_tab, all side tables havethe column invoice_number of the same type. Also, the value of theinvoice_number of each row in the sales_tab will be inserted into theside tables.

[0290] Since the default_view parameter is specified when enabling theXML column order, a default view sales_order_view is created by the XMLSystem. It joins the above five tables by the following statement:

[0291] CREATE VIEW sales_order_view(invoice_number,sales_person,order,order_key,customer,part_key,price,date)

[0292] AS

[0293] SELECT sales_tab.invoice_number, sales_tab.sales_person,sales_tab.order, order_tab.order_key, order_tab.customer,part_tab.part_key, price_tab.price, ship_tab.date)

[0294] FROM sales_tab, order_tab, part_tab, price_tab, ship_tab

[0295] WHERE sales_tab.invoice_number =order_tab.invoice_number

[0296] AND sales_tab.invoice_number =part_tab.invoice_number

[0297] AND sales_tab.invoice_number =price tab.invoice_number

[0298] AND sales_tab.invoice_number =ship_tab.invoice_number.

[0299] Because the tablespace in the enable_column command was notenabled. the default tablespace is used to create side tables. If thetablespace is specified and it does exist in the database, then the sidetables will be created in the specified side tables.

[0300] D.8 Inserting XML Documents

[0301] For XML columns, an entire XML document is always stored as thecolumn data. The insertion can be achieved in the following ways:

[0302] Using the default cast function:

[0303] For each UDT, there is a default cast function to convert the SQLbase type to the UDT. The following cast functions can be used in aVALUES clause. Input Default UDT Parameter Return Cast Function TypeType Description db2xml.XMLVarchar( ) varchar XMLVarchar Input frommemory buffer of varchar db2xml.XMLCLOB( ) clob XMLCLOB Input frommemory buffer of clob db2xml.XMLDBCLOB( ) dbclob XMLDBCLOB Input frommemory buffer of dbclob db2xml.XMLFile( ) varchar XMLFile Only storefile name db2xml.XMLURL( ) datalink XMLURL data type

[0304] The following SQL statement inserts the casted varchar type inthe host variable xml_buff into the XMLVarchar.

[0305] INSERT INTO sales_tab

[0306] VALUES(‘123456’, ‘Sriram Srinivasan’, db2xml.XMLVarchar(:xml_buff))

[0307] Using the Storage UDF:

[0308] For each XML System UDT, there is a storage(or import) UDF toimport data from a resource other than its base type. For example, if toimport an XML document in a file to the database as a XMLVarchar, thenthe function XMLVarcharFromFile( ) is used.

[0309] In the example below, a record is inserted into the sales_tabtable. The function XMLVarcharFromFile( ) imports the XML document froma file into the database and stores it as a XML Varchar.

[0310] EXEC SQL INSERT INTO sales_tab VALUES(‘123456’,

[0311] ‘Sriram Srinivasan’

[0312] XMLVarcharFromFile(‘/home/u1/xml/order.xml’))

[0313] The above example imports the XML object from the file“/home/u1/xml/order.xml” to the column order in the table sales_tab.

[0314] D.9 Retrieving XML Documents

[0315] The XML table is ready to use when the XML column is enabled.Retrieving an XML column directly returns the UDT as the column type. Auser can always use the default cast function provided by The databasefor distinct types to convert a UDT to an SQL base type, then operate onit. In addition to that, a user can also use overloaded UDF Content( )to retrieve document content from a file or URL to a memory buffer.

[0316] Using the default cast function:

[0317] The following cast functions, which are automatically created bythe database for the XML UDT, may be used in a SELECT statement. DefaultCast Input Return Function Parameter Type Type Descriptiondb2xml.varchar( ) XMLVarchar varchar XML document in variable length ofchar db2xml.clob( ) XMLCLOB clob XML document in CLOB db2xml.dbclob( )XMLDBCLOB dbclob XML in double byte CLOB db2xml.varchar( ) XMLFilevarchar XML filename in variable length of char db2xml.datalink( )XMLURL datalink URL of XML document

[0318] The following SQL statement shows how to use the default castfunction in a simple query.

[0319] EXEC SQL SELECT db2xml.varchar(order)from sales_tab

[0320] Using the content( ) UDF:

[0321] Suppose XML documents are stored as XMLFile or XMLURL, to operateon these XML documents in memory, the UDF content( ), which takesXMLFile or XMLURL as input and returns a varchar or CLOB, is used.

[0322] In the example below, a small sqc program segment illustrates howan XML document is retrieved from a file to memory. This example assumesthat the column order is of XMLFile type. EXEC SQL BEGIN DECLARESECTION; varchar(3k) xml_buff; EXEC SQL END DECLARE SECTION; EXEC SQLCONNECT TO mydb EXEC SQL DECLARE cl CURSOR FOR SELECT Content(order)from sales_tab WHERE sales_person = ‘Sriram Srinivasan’ EXEC SQL OPENcl; do { EXEC SQL FETCH cl INTO :xml_buff; if(SQLCODE != 0) { break; }else {  /* do whatever is needed to do with the XML doc in buffer */  }} EXEC SQL CLOSE cl;

[0323] D.10 Updating XML Documents

[0324] With the XML System, an entire XML document can be updated byreplacing the XML column data. The XML System provides two techniquesfor update:

[0325] Using cast functions or storage UDFs in the set clause of the SQLupdate statement:

[0326] In this case, a cast function or a UDF is used in the Set clause.Here is an example:

[0327] UPDATE sales_tab

[0328] set order=XMLVarcharFromFile(‘/home/u1/xml/order2.xml’)

[0329] WHERE sales_person=‘Sriram Srinivasan’

[0330] Using the Update( ) UDF:

[0331] The XML System provides a UDF Update( ) which allows a user tospecify a location path and the value of the element or attributerepresented by the location path to be replaced. In this case, a userdoes not need to retrieve the XML document and use an editor to changethe content. The XML System will do it automatically.

[0332] Here is an example of using the UDF Update( ). In this example,the content of “/Order/Customer” is updated to NewMart”.

[0333] UPDATE sales_tab

[0334] set order=Update(order.‘/Order/Customer’,‘NewMart’)

[0335] WHERE sales_person=‘Sriram Srinivasan’

[0336] For an XML Column, the XML System will update side tables ofextracted data when the XML column is updated. However, a user shouldnot update these side tables directly without updating original XMLdocuments stored in the XML column by changing the corresponding XMLelement or attribute value. Otherwise, there may be data inconsistencyproblems.

[0337] D.11 Retrieving XML Element Contents and Attribute Values

[0338] For XML columns, the XML System provides a UDF to extract elementor attribute values from entire XML documents. The retrieval isperformed on an XML document. It is a single document search. The XMLSystem provides extracting UDFs to retrieve XML elements or attributesin the SQL select clause. This is very useful after search filtering ona collection of XML documents to further obtain desired elements orattributes.

[0339] Suppose there are more than 1000 XML documents stored in thecolumn order in the table sales_tab. To find all customers who haveordered items which have the ExtendedPrice greater than $2500.00, thefollowing SQL statement with the extracting UDF in the select clause canbe used:

[0340] SELECT extractVarchar(Order,‘/Order/Customer’) fromsales_order_view

[0341] WHERE price>2500.00

[0342] where the UDF extractVarchar( ) takes the order as the input, andthe location path “/Order/Customer” as the select identifier, andreturns the names of the customer. Note, in this statement, only theorders with ExtendedPrice greater than $2500, say maybe 11 such orders,will be the input to the extracting function. The WHERE clause did thefiltering on the collection of 1000 XML documents already. Again, thesales_order_view is the default view to join the application tablesales_tab and all its side tables, where price is the part_tab.price,representing the “/Order/Part/ExtendedPrice”.

[0343] D.12 Searching an XML Document

[0344] The above sections have described how the XML System may be usedas a document repository for storage and retrieval, as well as forelement or attribute selection. Here, searching using indices created onside table columns, which contain XML element contents or attributevalues extracted from XML documents, is illustrated. Since the data typeof an element or attribute can be specified, searches can be performedon SQL general data types and range searches can be performed.

[0345] D.13 Search from Join View

[0346] If desired and specified when an XML column is enabled, the XMLSystem provides a default read-only view which joins the applicationtable with all created side tables through the same unique identifier.With the default view, or any view created by the application, a usercan search XML documents by a query on the side tables.

[0347] The above examples have referenced an application table sales_taband side tables order_tab, part_tab and ship_tab. The name of a defaultview sales_order_view is specified at the enabling column time. XMLSystem had created a default view sales_order_view which joins thesetables by the statement shown in the previous section.

[0348] The following example SQL statement will return the sales_personsof the sales_tab who have line item orders stored in the column orderwhere the ExtendedPrice is greater than $2500.00.

[0349] SELECT sales_person FROM sales_order_view

[0350] WHERE price>2500.00

[0351] The advantage of a query on the join view is that it provides avirtual single view of the application table and side tables. However,when more side tables are created. the more expensive the query will be.Therefore, it is only recommended when the total number of side tablecolumns is small. An application can create a desired view by joiningimportant side table columns for optimization. Note that the root_id,which can be the specified primary key in the application table or theDXXROOT_ID created by the XML System. provides the way to join tables.

[0352] D.14 Direct Query on Side Tables

[0353] Since the DAD is specified by the application, the side tablescreated by the XML System are known to the application programmer. Forbetter performance. an application can do query or sub-query on sidetables directly. The following example shows how to do so for the samequery stated above:

[0354] SELECT sales_person from sales_tab

[0355] WHERE invoice_number in

[0356] (SELECT invoice_number from part_tab

[0357] WHERE price>2500.00)

[0358] Note that the invoice_number is the primary key in theapplication table sales_tab. The advantage of direct query withsub-query is better performance. When side tables have parent-childrenrelationships, direct query with sub-query often make more sense.

[0359] D.15 Query Using UDF

[0360] In one embodiment, the side tables are created by the DAD, andindices are created for columns in the side tables. Therefore, thesearch will be fast with indexing.

[0361] In another embodiment, it is not required that a user create sidetables or indices on columns of side tables. The application still canuse the extracting UDFs to do the query. Since each extracting UDF willdo the source scan, it is very expensive. It should be used when otherrestrictions are applied to the WHERE clause so that the source scan isperformed to a limited number of XML documents.

[0362] Here is an example:

[0363] SELECT sales_person from sales_tab

[0364] WHERE extractVarchar(order,‘/Order:Customer’) like ‘%NewMart%’ANDinvoice_(—number>)100

[0365] D.16 Search on an Element or Attribute with Multiple Occurrences

[0366] In XML documents, one element name type may occur multiple times.Since attributes belong to elements. the same location path of anattribute may often refer to multiple values. The term “multipleoccurrence” will be used to specify this case.

[0367] In the DAD, a user can specify whether the location path willhave multiple occurrence. In the above DAD example, the“/Order/Part/price” has multiple occurrence, and the side tableprice_tab was crated for it. It is possible to have multiple rows in thepart_tab table containing the same invoice_number. Therefore, a usershould only select the distinct values. The following provides anexample of how to do query for this case:

[0368] SELECT sales_person from sales_tab

[0369] WHERE invoice_number in

[0370] (SELECT DISTINCT invoice_number from price_tab WHEREprice>2500.00)

[0371] On the other hand, since XML System provides additional columnDXX_SEQNO in the price_tab, a user can select a price and pair it withthe corresponding ShipDate. The following is an example:

[0372] SELECT price_tab.price, ship_tab.date from price_tab, ship_tab

[0373] WHERE price_tab.invoice_number=ship_tab.invoice_number AND

[0374] price_tab.DXX_SEQNO=ship_tab.DXX_SEQNO

[0375] A user can also select the price ordered by the sequence number,as illustrated in the following example:

[0376] SELECT price price_tab ORDER by DXX_SEQNO

[0377] D.17 Structural-Text Search

[0378] In one embodiment of the invention, the structural-text or fulltext search is performed after enabling XML columns with Text Extender,a product from International Business Machines, Corporation.

[0379] In the examples discussed herein, to perform structural-text onthe column order, a user can enable the column with the Text Extender,by specifying a text handle name, say “orderHandle”. Then with the TextExtender's section search support, the XML document with the word “XYZ”in the section “/Order/Customer” can be found. The following exampleshows how:

[0380] SELECT order FROM sales_tab

[0381] WHERE Contains(orderHandle, ‘model Ordersection(/Order/Customer)\“XYZ\”)=1

[0382] Where Order is the model name and Order/Customer is the sectionname.

[0383] D.18 Deleting XML Documents

[0384] Deleting a row from an XML table is done with a SQL DELETEstatement. A user can use the search technique discussed above tospecify the WHERE clause.

[0385] The following is a simple example:

[0386] DELETE from sales_tab

[0387] WHERE invoice_number in

[0388] (SELECT invoice_number from part_tab

[0389] WHERE price>2500.00)

[0390] D.19 Disable Columns

[0391] The disable_column option disables the XML enabled column. Thefollowing is the syntax for disabling a column:

[0392] dxxadm disable_column db_name tab_name column_name

[0393] The following are the arguments for disable_column:

[0394] db_name: the database name,

[0395] tab_name: the table name in which the XML column resides.

[0396] column_name: the name of XML column.

[0397] The following actions are performed by disable_column:

[0398] Delete the entry from the XML_USAGE table.

[0399] Decrement the USAGE_COUNT in the DTD_REF table.

[0400] Drop all triggers created with this column.

[0401] Drop side table associated with this column.

[0402] In one embodiment, a user must disable an XML column beforedropping an XML table. If an XML table is dropped, but its XML column isnot disabled, then all side tables created by the XML System will not bedropped. This may cause problems for the XML system to keep track of thenumber of enabled XML columns.

[0403] D.20 Detailed Techniques

[0404] The server code is the core of XML System. It has several majorcomponents, and each one performs a unique role in the product.

[0405] The admin stored procedures are used to “xmlally” enable anddisable the database, columns and indices. For performance and simplicy,these stored procedures were written in the embedded SQL.

[0406] The XML System provides a number of functions in the server code.The functions are: dxxEnableDB, dxxDisableDB( ), dxxEnableColumn( ),dxxDisableColumn( ), and dxxEnableCollection( ).

[0407] The dxxEnableDB stored procedure enables a database for XMLdocument access. It uses the DDL statements to create XML System UDTs, aset of external UDFs, a set of internal UDFs, the DTD reference table,and the XML_USAGE table. The implementation of these UDFs are in theUDFs component.

[0408] The dxxDisableDB( ) stored procedure drops everything created bythe dxxEnabeDB( ). It does error checking on DTD_REF and XML_USAGEtables.

[0409] The dxxEnableColumn( ) stored procedure enables an XML column ofthe XML System UDT. It parses the input DAD, create side tables, andtriggers according to the DAD. It also updates the XML_USAGE table.

[0410] The dxxDisableColumn( ) stored procedure disables an XML column.It deletes all side tables created by the XML System and updates theXML_USAGE table.

[0411] The dxxEnableCollection( ) stored procedure enables an XMLcollection. It inserts a new row in the XML_USAGE table and stores theinput DAD there. It checks or creates collection tables according to theDAD.

[0412] The design description comprises program functions that implementthe stored procedures in the source. These are listed below:

[0413] dxxdb.sqc

[0414] 1. dxxEnableDB( ) function name caller input output dxxEnableDB() main( ) in dbName errCode, (enable_db) dxxadm.sqc errMsg

[0415] The following is a Functional Description:

[0416] 1. Initialize input and output parameters form sqlda, and sqlca.

[0417] 2. Loop through enable statement array to execute each DDL.

[0418] 3. Check whether database is DBCS enabled.

[0419] 4. If DBCS enabled, loop to execute each DBCS enable DDLstatement

[0420] 5. error check.

[0421] 6. Set output parameter.

[0422] 2. dxxDisableDB( ) function name caller input outputdxxDisableDB( ) main( ) in dbName errCode, (disable_db) dxxadm.sqcerrMsg

[0423] The following is a Functional Description:

[0424] 1) Loop through disable statement array to execute each DDL.

[0425] 2) Check whether database is DBCS enabled.

[0426] 3) If DBCS enabled, loop to execute each DBCS disable DDLstatement.

[0427] 4) Error initialize input and output parameters form sqlda, andsqlca check.

[0428] 5) Set output parameter.

[0429] 3. IsDBCS_DB( ) function name caller input output isDBCS_DB( )dxxEnableDB( ), dbName TRUE or FALSE dxxDisableDB( )

[0430] The following is a Functional Description:

[0431] Check whether database is DBCS enabled, if so. return TRUE,otherwise return FALSE.

[0432] dxxcol.sqc

[0433] 1. dxxEnableColumn( ) function name caller input outputdxxEnableColumn main() in 2. dbName. 1. errCode dxxadm.sqc 3. tabName.2. errMsg 4. colName. 5. dadBuf. 6. tablespace. 7. defaultView. 8.rootID.

[0434] The following is a Functional Description:

[0435] 1. Initialize host variables.

[0436] 2. Initialize output parameters.

[0437] 3. Call getParameter( ) to get input parameters,

[0438] Call check_table( ) to check table name.

[0439] Call check_column( ) to check input column name, and

[0440] Check rootID.

[0441] 4. Initialize XML4C parser.

[0442] 5. Parse DAD.

[0443] 6. Call dad_popu( ) to populate DAD into internal data structure.

[0444] 7. Error checking on DAD:

[0445] access_mode,

[0446] DTDID

[0447] 8. Get colno of this column from syscat, used as suffix oftriggers.

[0448] 9. Call createSideTables( ) to create side tables.

[0449] 10. Create triggers on user tables:

[0450] 1. rootidTrigger_BIT,

[0451] 2. insertTrigger_AIT,

[0452] 3. deleteTrigger_ADT,

[0453] 4. updateTrigger_AUT,

[0454] 5. validateTrigger_VIT,

[0455] 6. validateTrigger_VUT

[0456] 11. Create default view.

[0457] 12. Insert a row into XML_USAGE table.

[0458] 13. Update DTD_REF table.

[0459] 14. Set output error message.

[0460] 15. Error check on commit.

[0461] 16. Free DAD structure.

[0462] 2. dxxDisableColumn( )

[0463] function name caller input output

[0464] dxxDisableColumn main( ) in dxxadm.sqc

[0465] 1. dbName,

[0466] 2. tabName,

[0467] 3. colName,

[0468] 1. errCode

[0469] 2. errMsg

[0470] The following is a Functional Description:

[0471] 1. Initialize host variables.

[0472] 2. Initialize output parameters.

[0473] 3. Call getParameter( ) to get input parameters.

[0474] 4. Check input parameters:

[0475] 1. call check_table( ) to check table name, and

[0476] 2. call check_column( ) to check input column name.

[0477] 5. Get column name, DAD, DTDID, defaultView,

[0478] Trigger suffix from XML_USAGE table.

[0479] 6. Check whether column is XML enabled and trigger suffix exists.

[0480] 7. Drop default view.

[0481] 8. Parse DAD and populate DAD data structure.

[0482] 9. Using DAD structure to delete all side tables.

[0483] 10. Call createSideTables( ) to create side tables.

[0484] 11. Drop triggers on user tables:

[0485] 1. reset DXXROOT_ID, drop rootidTrigger_BIT

[0486] 2. insertTrigger_AIT,

[0487] 3. deleteTrigger_ADT,

[0488] 4. updateTrigger_AUT,

[0489] 5. validateTrigger_VIT, and

[0490] 6. validateTrigger_VUT.

[0491] 12. Update DTD_REF table.

[0492] 13. Delete the row into XML_USAGE table.

[0493] 14. Set output error message.

[0494] 15. Error check on commit.

[0495] 16. Free DAD structure.

[0496] 3. check_table( ) function name caller input output check_tabledxxEnableColumn(), 1. dbName, 1. errCode dxxDisableColumn() 2. errMsg

[0497] The following is a Functional Description:

[0498] Check whether the table exists in the database by looking at thesyscat.columns.

[0499] 4. check_column( ) function name caller input output check_columndxxEnableColumn(), 1. table name, 1. errCode dxxDisableColumn() 2.column name 2. errMsg

[0500] The following is a Functional Description:

[0501] Check whether the column exists in the right table by looking atthe syscat.columns

[0502] 5. getParameter( )

[0503] function name caller input output

[0504] getParameter dxxEnableColumn( ),

[0505] dxxDisableColumn( )

[0506] 1. in_sqlvar

[0507] 1. data,

[0508] 2. errCode

[0509] 3. errMsg

[0510] The following is a Functional Description:

[0511] Extract parameter data from in_sqlvar, according to SQL_TYPE.

[0512] 6. createSideTable( ) function name caller input outputcreateSide dxxEnableColumn() 1. pDAD, 1. errCode Table() 2. rootid, 2.errMsg 3. rootid definition 4. tablespace,

[0513] The following is a Functional Description:

[0514] This routine creates all side tables specified in the DAD. Ittakes the pDAD (pointer to DAD) data structure, looping the list of sidetables, and generates the “CREATE TABLE” statement.

[0515] 1. If the rootid is not specified,

[0516] then add DXXROOT_ID as a not-null column,

[0517] else add the primary key in user table as a not-null column.

[0518] 2. If the tablespace is specified, add “IN” tablespace to theCREATE TABLE statement.

[0519] 7. rootidTrigger_BIT function name caller input outputrootidTrigger_BIT dxxEnableColumn() 1. pDAD 1. errCode 2. errMsg

[0520] The following is a Functional Description:

[0521] This routine creates the Before Insert Trigger (BIT) to add thevalue of DXXROOT_ID, which is generated from the generate_unique( )function.

[0522] 1. Check syscat.triggers to see whether the BIT exists or not ifexist, return error.

[0523] 2. Execute the create trigger statement as: function name callerinput output insertTrigger_AIT dxxEnable 1. pDAD 1. errCode, Column(),2. trigger_suffix, 2. errMsg 3. rootid,

[0524] where user_table and xmlcolumn are taken from pDAD.

[0525] 8. insertTrigger_AIT Loop through pDAD->s_table.for each s_table,pDADst do: For each column in the pDADst do: 1.  If (pDADst->col is notmultiple occurred) then set the trieger_stmt to: INSERT INTO side_tabVALUES (NEWROW.rootid, db2xml.extractDataType(xmlcolumn,path)) else setthe trigger_stmt to: INSERT INTO side_tab SELECTNEWROW.rootid,db2xml.seqno( ). RETURNED DataType FROMTABLE(db2xml.extractDataTypes(xmlcolumn.path)) x 2.  Execute thestatement: “CREATE TRIGGER user_tab.AITtrigger_suffix AFTER INSERT ONuser_tab REFERENCING NEW AS NEWROW FOR EACH ROW MODE DB2SQL WHEN(xmlcolumn IS NOT NULL) BEGIN ATOMIC  trigger_stmt; END” where user_tab,side_tab, xmlcolumn and path are getting from pDADst, and DataType isgetting from the call of mapType(pDADst->col->type} Note: thedb2xml.seqno( ) is a UDF to generate a sequence number of multipleoccurrence.

[0526] The following is a Functional Description:

[0527] This routine creates the After Insert Trigger (AIT) to populatethe side tables after a row is inserted into the user table with an XMLcolumn.

[0528] Loop through pDAD->s_table, for each s_table, pDADst do:

[0529] For each column in the pDADst do:

[0530] 1. If (pDADst->col is not multiple occurred)

[0531] then set the trigger_stmt to:

[0532] INSERT INTO side_tab VALUES

[0533] (NEWROW.rootid,

[0534] db2xml.extractDataType(xmlcolumn,path))

[0535] else set the trigger_stmt to:

[0536] INSERT INTO side_tab

[0537] SELECT NEWROW.rootid,db2xml.seqno( ),

[0538] RETURNED DataType FROM

[0539] TABLE(dx2xml.extractDataTypes(xmlcolumn,path)) x

[0540] 2. Execute the statement:

[0541] “CREATE TRIGGER user_tab.AITtrigger_suffix

[0542] AFTER INSERT ON user_tab

[0543] REFERENCING NEW AS NEWROW

[0544] FOR EACH ROW MODE DB2SQL

[0545] WHEN (xmlcolumn IS NOT NULL)

[0546] BEGIN ATOMIC

[0547] trigger_stmt;

[0548] END”

[0549] where user_tab, side_tab, xmlcolumn and path are getting frompDADst, and DataType is getting from the call ofmapType(pDADst->col->type)

[0550] Note: the db2xml.seqno( ) is a UDF to generate a sequence numberof multiple occurrence.

[0551] 9. delete trigger_ADT function name caller input outputdeleteTrigger_ADT dxxEnableColumn() 1. pDAD 1. errCode 2.trigger_suffix, 2. errMsg 3. rootid,

[0552] The following, is a Functional Description:

[0553] This routine creates the After Delete Trigger (ADT) to deleterows in side tables after a row is deleted from the user table with anXML column

[0554] Loop through pDAD->s_table. for each s_table, pDADst do: executethe statement: “CREATE TRIGGER user_tab.ADTtrigger_suffix AFTER DELETEON user_tab REFERENCING OLD AS OLDROW FOR EACH ROW MODE DB2SQL BEGINATOMIC DELETE FROM side_tab WHERE OLDROW.rootid = side_tab.rootid END”

[0555] where user_tab, side_tab, xmlcolumn and path are getting frompDADst

[0556] 10. updateTrigger_AUT function name caller input outputupdateTrig- dxxEnableColumn() 1. pDAD, 1. data. ger_AUT 2.trigger_suffix, 2. errCode 3. rootid 3. errMsg

[0557] The following is a Functional Description:

[0558] This routine creates the After Update Trigger (AUT) to updaterows in side tables after a row is updated in the user table with an XMLcolumn. Loop through pDAD->s_table, for each s_table. pDADst do: Foreach column in the pDADst do: If (pDADst->col is not multiple occurred)then set the trigger_stmt to: UPDATE side_tab SET set_stmt WHERENEWROW.rootid = side_tab.rootid else set the trigger_stmt to: DELETEFROM side_tab WHERE NEWROW.rootid = side_tab.rootid; INSERT INTOside_tab SELECT NEWROW.rootid, db2xml.seqno( ). RETURNED DataType FROMTABLE(db2xml.extractDataTypes(xmlcolumn.path)) x Execute the statement:“CREATE TRIGGER user_tab.AUTtrigger_suffix AFTER UPDATE ON user_tabREFERENCING NEW AS NEWROW FOR EACH ROW MODE DB2SQL WHEN (xmlcolumn ISNOT NULL) BEGIN ATOMIC trigger_stmt: END” where user_tab, side_tab,xmlcolumn and path are getting from pDADst, and Data Type are gettingfrom the call of mapType(pDADst->col->type)

[0559] 11. validationTrigger_VBIT function name caller input outputvalidationTrig- dxxEnableColumn() 1. pDAD, 1. errCode, ger_VBIT 2.trigger_suffix 2. errMsg

[0560] The following is a Functional Description:

[0561] This routine create a Validation Before Insert Trigger (VBIT) tovalidate an input XML document before inserting it into a user table.Due to the use of XML4C parser, it retrieves the DTD from dtd_ref tableand puts it in an external file, then calls the UDF db2xml.validate inthe trigger.

[0562] It executes the following statement: “CREATE TRIGGERuser_tab.VBITtrigger_suffix BEFORE INSERT ON user_tab REFERENCING NEW ASNEWROW FOR EACH ROW MODE DB2SQL WHEN validation != 0 SIGNAL SQLSTATE‘DXX_SQLSTATE_INVALID_DOC’ (DXX_0000E) where validation= SELECTdb2xml.validate(NEWROW.xmlcolumn, db2xml.content(content,tmpfileName),pDAD−>dtdid) FROM db2xml.dtd_ref WHERE dtdid=pDAD−>dtdid)

[0563] where user_tab, xmlcolumn are getting from pDAD, tmpefileName isset by this routine, and the “content” is the column name in dtd_ref forDTD. db2xml.content( ) is a UDF.

[0564] 12. validationTrigger_VBUT function name caller input outputvalidationTrig- dxxEnableColumn() 1. pDAD, 1. errCode ger_VBUT 2.trigger_suffix 2. errMsg

[0565] The following is a Functional Description:

[0566] This routine create a Validation Before Update Trigger (VBUT) tovalidate an input XML document before updating it in user table. Due tothe use of XML4C parser, it retrieves the DTD from dtd_ref table, putsit to an external file, then calls the UDF db2xml.validate in thetrigger.

[0567] It executes the following statement: “CREATE TRIGGERuser_tab.VBUTtrigger_suffix BEFORE UPDATE ON user_tab REFERENCING NEW ASNEWROW FOR EACH ROW MODE DB2SQL WHEN validation != 0 SIGNAL SQLSTATE‘DXX_SQLSTATE_INVALID_DOC’ (DXX_0000E) where validation= SELECTdb2xml.validate(NEWROW.xmlcolumn. db2xml.content(content,tmpfileName),pDAD−>dtdid) FROM db2xml.dtd_ref WHERE dtdid=pDAD−>dtdid)

[0568] where validation=

[0569] SELECT db2xml.validate(NEWROW.xmlcolumn,db2xml.content(content,tmpfileName), pDAD->dtdid)

[0570] FROM db2xml.dtd_ref WHERE dtdid=pDAD->dtdid)

[0571] where user_tab, xmlcolumn are getting from pDAD, tmpefileName isset by this routine, and the “content” is the column name in dtd_ref forDTD. db2xml.content( ) is a UDF.

[0572] 13. createDefaultView function name caller input outputcreateDefaultView dxxEnableColumn() 1. pDAD, 1. errCode 2. rootid, 2.errMsg 3. tablename, 4. default_view

[0573] The following is a Functional Description:

[0574] This routine creates a default view which joins the user tableand XML column side tables together with the name specified as the inputparameter default view. The key here is to join by the rootid, which canbe the DXXROOT_ID or the primary key of user table. As the input to thisroutine, the rootid is used as the column name for join.

[0575] 1. Declare a cursor on the statement:

[0576] SELECT colname FROM syscat.column WHERE tabname=tablename executethe statement, open the cursor and tetch on the cursor, while (! end offetch ){

[0577] get column and append user_tab.colunm to string userTableColumnsappend tablename to string alltablenames,

[0578] 2. Looping through the pDAD structure,

[0579] for each side table pDADst do:

[0580] append side table name pDADst->stbname to alltablenames,

[0581] append to string join_condition with

[0582] “tablename.rootid=pDADst->stbname.rootid”;

[0583] loop through the pDADst->col list,

[0584] for each column in the side table do:

[0585] append the pDADst->stbname.pDAdst->scolname to the stringsideTableColumns;

[0586] 3. execute the create view statement:

[0587] CREATE VIEW default_view AS

[0588] SELECT usertableColumns sideTableColumns FROM alltablenames

[0589] WHERE join_condition

[0590] D.21 Flow Diagrams

[0591]FIG. 4 is a flow diagram illustrating steps performed by the XMLSystem in creating and maintaining XML document data as column data. InBlock 400, the XML System creates a table with an XML column having aXML column type. The table is created in response to a CREATE TABLEstatement that specifies the XML column. In Block 402, the XML Systemenables the XML column. Next, the XML System, in Block 404, creates sidetables using a Data Access Definition for the XML column. In Block 406,the XML System creates triggers for Insert, Update, and Delete on theXML column, so that the side tables are populated when the main table ispopulated and the side tables are modified when the main table ismodified. Thus, the main table and side tables are synchronized. InBlock 408, when data is inserted into the main table, the XML Systeminserts data into the side tables. In Block 410, when the main table ismodified (i.e., data is updated or deleted), the XML System modifies theside tables.

[0592]FIGS. 5 and 6 illustrate key aspects of an embodiment of theinvention. In particular. these figures illustrate enabling a column anddisabling a column.

[0593]FIG. 5 is a flow diagram of steps performed by the XML System toenable a column. In block 500, the XML System initializes all variables.In block 502, the XML System gets and checks input parameters. In block504, the XML System calls a XML4C parser to parse a DAD. In block 506,the XML System determines whether the root_id is input by anapplication. If not, the XML System continues to block 508, otherwise,the XML system continues to block 510. In block 508, the XML Systemcreates side tables with DXXROOT_ID. In block 510, the XML Systemcreates side tables with the user table's primary key as the root_id. Inblock 512, the XML System creates the root_id. insert. delete. andupdate triggers on user tables.

[0594] In block 514, the XML System determines whether the DAD specifiesvalidation. If so, the XML System continues to block 516, otherwise, theXML System continues to block 518. In block 516, the XML System createsvalidation triggers. In block 518, the XML System determines whether adefault view is input by the application. If so, the XML Systemcontinues to block 520, otherwise, the XML System continues to block522. In block 520, the XML System creates a default view. In block 522,the XML System inserts an entry into XML_USAGE TABLE. In block 524, theXML System updates the DTD_REF.

[0595]FIG. 6 is a flow diagram of steps performed by the XML System todisable a column. In block 600, the XML System initializes allvariables. In block 602, the XML System gets and checks inputparameters. In block 604, the XML System gets the DAD, DTDID, anddefault view from the XML_USAGE TABLE. In block 606, the XML Systemdetermines whether the default view is null. If si, the XML Systemcontinues to block 610, otherwise, the XML system continues to block608. In block 608, the XML System drops the default view.

[0596] In block 610, the XML System parses the DAD to get side tablenames. In block 612. the XML System drops all side tables. In block 614,the XML System drops the root_id. insert, delete, and update triggers onuser tables. In block 616, the XML System determines whether the DADspecifies validation. If so, the XML System continues to block 618,otherwise, the XML System continues to block 620. In block 616, the XMLSystem creates validation triggers. In block 620, the XML System deletesthe entry from the XML_USAGE TABLE. In block 622, the XML System updatesthe DTD_REF table.

E. GENERATING ONE OR MORE XML DOCUMENTS FROM A SINGLE SQL QUERY

[0597] In one embodiment of the invention, an XML System is providedthat generates one or more XML documents from a single SQL query. Thistechnique is referred to as “SQL mapping”. The XML System retrieves datain existing relational database tables and forms a set of one or moreXML documents. Using the XML System, application programs can turnexisting business data into one or more new XML documents to beinterchanged from business to business via a network, such as theinternet or an intranet.

[0598] The XML System takes a single SQL query, along with a definitionof the data model from which one or more XML documents are to begenerated (i.e., a DAD), and forms one or more XML documents using thedata in existing database tables which meet the query condition.

[0599] The XML System is implemented by stored procedures which can becalled from the database client code. The stored procedures take a DataAccess Definition (DAD), which consists of the SQL query, the ExtensibleMarkup Language Path (XPath) data model based definition of the documentstructure to be generated, and a table name which will contain thegenerated one or more XML documents as its row data. The storedprocedures use a heuristic technique to eliminate duplication from theSQL query. Additionally, the stored procedure identifies the relationalhierarchy of the SQL query and maps the data obtained from the SQL queryinto elements and attributes of generated one or more XML documents.

[0600] An Xcollection defines how to compose one or more XML documentsfrom a collection of relational tables. An XML collection is a virtualname of a set of relational tables. Applications can enable an XMLcollection of any user tables. These user tables can be existing tablesof legacy business data or ones newly created by the XML System. A usercan access XML collection data through the stored procedures provided bythe XML System.

[0601] An XML collection is used to transform data between databasetables and one or more XML documents. An XML collection achieves thegoal of data interchange via XML. For applications that want to composeone or more XML documents from a set of relational tables, the XMLSystem offers a technique to enable an XML collection through a DocumentAccess Definition (DAD). In the Document Access Definition, applicationscan make a custom mapping between database column data in new orexisting tables to XML elements or attributes. The access to an XMLcollection is by calling the XML System's stored procedures or directlyquerying the tables of the collection.

[0602] E.1 Example

[0603] The following discussion provides an example of generating one ormore XML documents from a relational database using an SQL query and asimple DAD. In particular, a relational database is illustrated. Then,an SQL query is illustrated that is used to retrieve data from therelational database. Next, the results of the SQL query are illustrated.Moreover, the Document Access Definition (DAD), which contains the SQLquery is illustrated, along with a Document Type Definition (DTD). Afterthis, one XML document that is generated to contain the data retrievedby the SQL query is illustrated.

[0604] Relational Database:

[0605] order_tab: order_key customer_name customer_email customer_phone1 General Motor parts@gm.com 800-GM-PARTS part_key color qty price taxorder_key 156 red 17 17954.55 0.02 1  68 black 36 34850.16 0.06 1 128red 28 38000.00 0.07 1

[0606] ship_tab: date mode comment part_key 1998-03-13 TRUCK This is thefirst shipment to 156 service of GM. 1999-01-16 FEDEX This the secondshipment to 156 service of GM. 1998-08-19 BOAT This shipment isrequested  68 by a call. from GM marketing. 1998-08-19 AIR This shipmentis ordered by  68 an email. 1998-12-30 TRUCK NULL 128

[0607] The following is an SQL query. The SELECT term selects columns.The FROM term indicates the tables from which data is to be selected.The WHERE term indicates the conditions for selecting data. This SQLquery is defined in a Document Access Definition, which is illustratedbelow.

[0608] SELECT o.order_key, customer_name, customer_email, p.part_key,color, qty, price, tax, ship_id, date, mode

[0609] FROM order_tab o, part_tab p, table(selectsubstr(char(timestamp(generate unique( ))), 16)

[0610] as ship_id, date, mode, part_key from ship_tab)

[0611] WHERE order_key=1 and p.price>20000 and

[0612] p.order_key=o.order_key and s.part_key=p.part_key

[0613] The following is a table holding the results of executing the SQLquery: order_ customer_ customer_ part_ key name email key color qtyprice tax ship_id date mode 1 General parts@gm com  68 red 36 34850 160.06 4 58 825484 1998-08-19 BOAT Motor 1 General parts@gm com  68 red 3634850 16 0.06 4 58 825537 1998-08-19 AIR Motor 1 General parts@gm com128 red 28 38000 00 0.07 4 58 825589 1998-12-30 TRUCK Motor

[0614] The data in order_key, customer_name, customer_email, part_key,qty, price, and tax are duplicated for each shipment. The data inorder_key, customer_name, and customer_email are duplicated for eachpart. This issue is addressed by partitioning the columns intoequivalence classes that reflect the semantics of the relational data:{order_key, customer_name, customer email}, {part_key, color, qty,price, tax}, and {ship_id date, mode}. The XML System opens a new cursoronly when it crosses a boundary between classes.

[0615] A user can decide how structured XML documents are to be storedor created through a Document Access Definition (DAD). The DAD itself isan XML formatted document. The DAD associates XML documents to adatabase by defining an Xcollection. The SQL_stmt in the DAD is an SQLquery that specifies how columns of a table are to be mapped to XMLelements and attributes. The columns in the SELECT clause are mapped toXML elements or attributes. They will be used to define the value ofattribute_nodes or content of text_nodes. The FROM clause defines thetables containing the data, and the WHERE clause specifies the join andsearch conditions.

[0616] Assuming the following structure of an XML document will begenerated from the data selected by a SQL_stmt, how to use an XMLcollection to specify the DAD will be illustrated below. <?xmlversion=“1.0”?> <!DOCTYPE Order SYSTEM “E:\dxx\test\dtd\litem.dtd”><Order key=“1”>  <Customer>General Motor</Customer>  <Part key=“68”>  <Quantity>36</Quantity>   <ExtendedPrice>34850.16</ExtendedPrice>  <Tax>0.06</Tax>   <Shipment>    <ShipDate>1998-04-12</ShipDate>   <ShipMode>BOAT</ShipMode>   <Comment>This shipment is requested by a   call from GM marketing</Comment>   </Shipment>   <Shipment>   <ShipDate>1998-08-19</ShipDate>    <ShipMode>AIR</ShipMode>  <Comment>This shipment is ordered by an    email</Comment>  </Shipment>  </Part>=“128”>   <Quantity>28</Quantity>  <ExtendedPrice>38000.00</ExtendedPrice>   <Tax>0.07</Tax>   <Shipment>   <ShipDate>1998-12-30</ShipDate>    <ShipMode>TRUCK</ShipMode>  <Comment>This is the first shipment to    service of GM</Comment>  </Shipment>  </Part> </Order>

[0617] The following sample DAD shows how to define the mapping fromrelational tables to one or more XML documents using SQL mapping. Thefollowing sample DAD shows how to specify an SQL query to compose a setof one or more XML documents from data in three relational tables.Litem_DAD2.dad <?xml version=“1.0”?> <!DOCTYPE Order SYSTEM“E:\dtd\dxxdad.dtd”> <DAD> <dtdid>E:\dtd\lineItem.dtd</dtdid><validation>YES</validation> <Xcollection> <SQL_stmt> SELECTo.order_key, customer, p.part_key, qty, price, tax, ship_id,date,mode,comment FROM order_tab o, part_tab p,  table(selectsubstr(char(timestamp(generate_unique())),16) as  ship_id,date,mode,comments from ship_tab) as s WHERE p.price &amp;gt;2500.00 and s.date &gt; “1996-06-01” AND   p.order_key = o.order_key ands.part_key = p.part_key </SQL_stmt> <objids>  <column name=“order_key”/> <column name=“part_key”/>  <column name=“ship_id”/> </objids><prolog>?xml version=“1.0”?</prolog> <doctype>!DOCTYPE Order SYSTEM“E:\dtd\lineItem.dtd”</doctype> <root_node>  <element_node name=“Order”>  <attribute_node name=“Key”>    <column name=“order_key”/>  </attribute_node>  <element_node name=“Customer”>   <text_node>   <column name=“customer”/>   </text_node>  </element_node> <element_node name=“Part”>   <attribute_node name=“Key”>    <columnname=“part_key”/>   </attribute_node>   <element_node name=“Quantity”>   <text_node>     <column name=“qty”/>    </text_node>  </element_node>   <element_node name=“ExtendedPrice”>    <text_node>    <column name=“price”/>    </text_node>   </element_node>  <element_node name=“Tax”>    <text_node>     <column name=“tax”/>   </text_node>   </element_node>   <element_node name=“Shipment”>  <element_node name=“ShipDate”>    <text_node>     <column name=“date”>   </text_node>   </element_node>   <element_node name=“ShipMode”>   <text_node>     <column name=“mode”/>    </text_node>  </element_node>   <element_node name=“Comments”>    <text_node>    <column name=“comment”/>    </text_node>   </element_node> </element_node> <!-- end Shipment -->  </element_node> <!-- end Part--> </element_node> <!-- end Order --> </root_node> </Xcollection></DAD>

[0618] The SQL query should be in a top-down order of the relationalhierarchy. In the example, it is required to specify the selectedcolumns in the order of 3 levels: order, part and shipment. Within eachlevel, the objid must be the first column. If the order described is notpreserved, the generated XML documents may not be correct.

[0619] E.2 How to Use an XML Collection

[0620] An XML collection is a set of relational tables which contain XMLdata. These tables can be new tables generated by the XML System orexisting tables which have data to be used by the XML System to generateone or more XML documents. Stored procedures provided by the XML Systemserve as the access methods. Unlike the XML column, an XML collectiondoes not have to be enabled. The enablement is based on the operationsperformed.

[0621] A composition operation of an XML collection is to generate oneor more XML documents from data existing in the collection tables.Therefore, for this operation, an XML collection does not need to beenabled, providing all tables already exist in the database. The DADwill be passed to stored procedures. The DAD can be overridden by otherXML query parameters as the stored procedure input parameters. This kindof parameter can be obtained from various sources (e.g., dynamicallyfrom the web).

[0622] In the DAD preparation, first “Xcollection” is defined. AnXcollection can be defined for composition or decomposition, in the wayof either SQL mapping or RDB_node mapping. In both cases, the followingsteps should apply:

[0623] 1. Specify the DTDID. The DTDID must be the same as the system IDin the doctype.

[0624] 2. Specify the validation. When using DTDID, the validationshould be always be “YES”.

[0625] 3. Specify the prolog and doctype. For composition, the sameprolog and doctype are allowed.

[0626] E.2.1 Enabling an XML Collection

[0627] The purpose of enabling an XML Collection for decomposition is toparse a DAD, create new tables or check the mapping against existingtables. The DAD is stored into the XML_USAGE table when the XMLCollection is enabled.

[0628] When a user prefers to have the XML System create collectiontables, the user should enable the XML collection. Additionally, theenablement depends on the stored procedure the user chooses to use. Thestored procedure dxyInsertXML( ) will take XML Collection name as theinput parameter. In order to use the stored procedure dxxInsertXML( ),the user must enable an XML collection before calling it. The user cancall stored procedure dxxShredXML( ) without the enabling of an XMLcollection by passing a DAD. In the later caPse, all tables specifiedmust exist in the database.

[0629] E.2.1.1 Enabling XML Collection Option

[0630] For composition, an XML collection is not required to be enabled.The assumption is that all collection tables already exist in thedatabase. The stored procedure can take a DAD as an input parameter andgenerate XML documents based on the DAD. On the other hand, thecomposition is the opposite of the decomposition. For XML collectionsenabled during the decomposition process, the DAD is likely to be usedto compose XML documents again. If the same DAD is used, then thecollection can be enabled for both composition and decomposition.

[0631] An XML Collection can be enabled through the XML Systemadministration GUI (graphical user interface) or using the dxxadmcommand with the Enable_collection option. The syntax of the option on aDB2 server is as follows:

[0632] dxxadm enable_collection db_name collection_name DAD_file [ttablespace]

[0633] where

[0634] a. db_name: name of the database.

[0635] b. collection_name: name of the XML collection, which will beused as the parameter to the stored procedures.

[0636] c. DAD_file: Data Access Definition(DAD) file.

[0637] d. tablespace: Optional. The tablespace contains tables in theXML collection. If new tables need to be created for decomposition, thennew tables will be created in the specified tablespace.

[0638] The following is an example of enabling the XML collection calledsales_order in database mydb with the DAD_file Litem_DAD3.dad.

[0639] /home/u1>dxxadm enable_collection mydb sales_order Litem_DAD3.dad

[0640] DXXA009I XML System is enabling collection sales_order. Pleasewait.

[0641] DXXA010I XML System has successfully enabled the collectionsales_order.

[0642] /home/u>

[0643] The enable_collection option mainly does the following things toa database:

[0644] e. Read the DAD_file, call XML parser to parse DAD, and saveinternal information for mapping.

[0645] f. Store internal information into the XML_USAGE table.

[0646] The option is good for performance and is usually helpful toperform composition and decomposition using one DAD.

[0647] E.2.1.2 Enable_collection Option

[0648] The enable_collection option enables an XML collection associatedwith an application table. The association between the application tableand the side table specified by the DAD is through the root_id.

[0649] Syntax

[0650] dxxadm enable_collection db_name collection DAD_File

[0651] [-t tablespace] [-l login] [-p password]

[0652] Argument

[0653] g. db_name: the database name.

[0654] h. collection: the name of an XML collection.

[0655] i. DAD_File: the file containing the DAD.

[0656] j. tablespace: Optional. The tablespace containing a user tablespecified in the DAD or side table created by the XML System.

[0657] k. login: Optional. The user ID, which is only needed if thecommand is invoked from a DB2 client.

[0658] l. password: Optional. The password, which is only needed if thecommand is invoked from a DB2 client.

[0659] The enable_collection option will enable an XML collection. Theenablement process is to parse the DAD and prepare tables for XMLcollection access. It takes the database name, a name of the XMLcollection, a DAD_File and an optional tablespace. The XML collectionwill be enabled based on the DAD in the DAD_File. It checks whether thetables specified in the DAD exist. If the tables do not exist, the XMLSystem will create the tables according to the specification in the DAD.The column name and data type is taken from the RDB_node of anattribute_node or text_node. If the tables exist, the XML System willcheck whether the columns were specified with the right name and datatypes in the corresponding tables. If a mismatch is found, an error willbe returned. The tablespace is optional, but it is specified if thecollection tables are to be created in a tablespace other than thedefault tablespace of the specified database.

[0660] The enable_collection is required for decomposition storedprocedure dxxInsertXML( ), and its pairing dxxRetrieveXML( ), and thedxxUpdateXML( ). For stored procedure dxxGenXML( ) and dxxShredXML( )which take a DAD as input. the enablement of an XML collection is notrequired. For the latter stored procedures, it is assumed that alltables specified in the DAD exist in the database already. If they don'texist, an error will be returned. The enable_collection does have apairing disable_collection option. But the operation ofdisable_collection is much simpler. It just deletes the collection fromXML_USAGE table.

[0661]FIG. 7 is a diagram illustrating code organization to compose XMLdocuments. The dxxGenXML( ) stored procedure 700 and the dxxRetrieveXML() stored procedure 702 each invoke the dxxComposeXML( ) module 704.Decision block 706 determines whether RDB_node mapping is to beperformed. If RDB_node mapping is not to be performed, processingcontinues with SQL to XML module 708 and the SQL to XML mapping isperformed by the SQL to XML mapping technique 710. If RDB_node mappingis to be performed, processing continues with the dxxGenXMLFrom RDB( )module 712 and RDB_node mapping is performed by the RDBR technique.

[0662] E.3 Using SQL Mapping Scheme

[0663] The mapping between composed XML documents and an XML collectionis specified in the Xcollection of a DAD. The XML System adapts thenotation used in XPath and uses a subset of it to define the XMLdocument structure. In order to facilitate the mapping, the XML Systemintroduces the element SQL_stmt to the Xcollection.

[0664] The DAD defines the XML document tree structure using seven kindsof nodes defined by XPath:

[0665] root_node

[0666] element_node

[0667] text_node

[0668] attribute_node

[0669] namespace_node

[0670] processing_instruction_node

[0671] comment_node

[0672] The element SQL_stmt is designed to allow simple and directmapping from relational data to one or more XML documents through asingle SQL statement. It is useful for the composition when applicationprogrammers know exactly what data they want to select from a databaseand compose the one or more XML documents. The content of the SQL stmtmust be a valid SQL select statement. The columns in the SELECT clauseare mapped to XML elements or attributes. They will be used to definethe value of attribute_nodes or content of text_nodes. The FROM clausedefines the tables containing the data, and the WHERE clause specifiesthe join and search conditions.

[0673] In the definition of an Xcollection, for this embodiment of theinvention, the following approach is used to define the SQL mapping:

[0674] Provide direct SQL statement in one optional SQL_stmt, thenspecify the mapping between the columns in the SQL statement andtext_nodes or attribute_nodes in the XML data model. In this case, theXML System will use the SQL statement to select the data for compositionor insert data for decomposition.

[0675] The text_node and attribute_node will have a one-to-one mappingto/from a column in a relational table. Therefore, each of them willhave a column to define the mapping, where the column is needed for SQLmapping. It is possible that an element_node has no text_node but onlychild element_node(s).

[0676] The SQL mapping is simple and powerful. For SQL mapping, a usermay join all tables in one select statement to form a query.

[0677] The SQL mapping requires a user to supply an SQL stmt in a DAD.To simplify the demonstration, the following steps guide a user todefine an ‘Xcollection’ for composition, using SQL mapping. The composedXML document order.xml is in C.5, the given DTD LineItem.dtd is in C.4,and Litem_DAD2.dad in E.1.

[0678] Define only one SQL_stmt in the Xcollection. In the SQL_stmt,supply an SQL query which joins all tables of the XML collection andselect desired columns from these tables. The predicate as the querycondition should be specified in the WHERE clause. In the SELECT clause,the column name to be selected should be unique. In case of two columnsof different tables having the same name, use the AS to make themdifferent. This is because the column name specified in the selectclause will be used to identify the value of an attribute_node ortext_node.

[0679] Note, in the examples of this section, three tables: order_tab,part_tab and ship_tab are joined through the order_key and part_key, andorder_key, part_key, price, qty, tax, date, mode and comment areselected. The query condition is price>2500.00 and date>“1996-06-01”.

[0680] Define the “ORDER BY” clause at the end of SQL_stmt. For eachtable, define a column or a unique identifier so that the one or moreXML documents generated will be ordered by it. In the example, theprimary key column is used for order_tab and part_tab. However, it isnot necessary to use the primary key as long as it is unique andidentifies a row object in the table. It can be generated by using thegenerate_unique() function or a user defined function (UDF) and does notneed to be a real column name. However, “AS” followed by a conceptualcolumn name in the SQL_stmt must be used, first, then that name is usedin the “ORDER BY” clause. Ship_id is used in the example for ship_tab inthis way.

[0681] Define the tree data structure of the XML documents by specifyingthe root_node. The root_node should have only one child element_node, inthe example, the element_node for “Order”.

[0682] Specify column name of each attribute_node and text_node in thedocument tree structure. It must be specified, and the column name mustbe in the select clause of the SQL_stmt. A user can also specify anoptional condition to qualify the documents to be generated.

[0683] E.4 Detailed Techniques

[0684] E.4.1 Levels

[0685] In the relational data model, entities may have one-to-manyrelationships. For example, one order may have many parts, and one partmay have many shipments. If these relationships are visualized in theform of a tree, order could be regarded as the root, which has parts asits children, and each part has shipments as its children. For thisdiscussion, the different levels of this tree are called “relationallevels.”

[0686] An XML document also has a tree structure that consists ofelements at different levels in the tree. Unfortunately, these levels donot necessarily match the levels in the relational model. For example,although Customer and Part are at the same level in the XML tree (sinceboth are immediate children of Order), an Order may have multiple Partsand it can have only one Customer. Therefore, the element Part needs tobe generated in a way which is different from the way in which Customeris generated. To generate Parts, the XML System opens a new cursor andloops through parts. To generate Customer, the XML System retrieves datafrom the same descriptor (SQLDA) as Order, and no new cursor or loop isrequired.

[0687] In one embodiment, the implicit stack of recursion keeps track ofthe XML levels only. An additional data structure (a level map) is usedto keep track of the relational levels in order to make the techniquebehave properly.

[0688] To generate a level map, the columns of the SQL query result arepartitioned into equivalence classes, such that the columns in eachclass are at the same relational level. Because the notion of relationallevels are in the semantics of the data, it is generally impossible todeduce this partition information from the SQL query alone, especiallyin legacy databases where tables have been created without properdeclarations of primary keys and foreign keys.

[0689] In one embodiment, the user specifies the partition in a DAD bydeciding which pieces of data should “come together as a classconceptually.” In the example, order key, customer_name, andcustomer_email come together to form the conceptual class Order.Similarly, date and mode should come together to form the notion of ashipment.

[0690] In another embodiment, the partition is generated automaticallyusing some heuristics. One heuristic technique assumes that the columnsin the result of the SQL query are in a top-down order of the relationalhierarchy. It also requires a single-column candidate key to begin eachlevel. If such a key is not available in a table, the query generatesone for that table using a table expression and the built-in functiongenerate_unique( ). For further illustration, refer to the query in theexample to see how it handle ship_tab, which does not have asingle-column candidate key.

[0691] The technique selects distinct counts from the result of the SQLquery on the first column, the first and the second, the first and thesecond and the third and so on. It starts a new partition whenever itdetects a change in the distinct counts. Because of a restriction of the“select distinct” feature of DB2A, any character data longer than 254bytes will be truncated.

[0692] The following data structures are used:

[0693] levelmap

[0694] Levelmap is an associative array that maps column names to theirequivalence class numbers or “relational level.” The equivalence classesin ascending order of relational levels should have one-to-manyrelationship between each adjacent classes with the “many” side at theupper level. In the example, “order_key” maps to 0; “part_key” maps to1; and “date” maps to 2. The associative array can be implemented inmemory, for example, as a hash table, a sorted array, or a binary searchtree.

[0695] The following are methods of levelmap:

[0696] acquire( )

[0697] It uses some heuristics to determine the partition to initializethe level map. This method requires SQL access to dxxcache.

[0698] int getLevel(column)

[0699] It retrieves the relational level of the column.

[0700] char *[ ] getcolumns(int level)

[0701] It retrieves the column names of the specified level.

[0702] releases

[0703] It deallocates the memory that was allocated in acquires.

[0704] outbuf

[0705] It is an automatically expandable buffer for holding an XMLdocument during construction. At the end of the construction, outbufholds the entire XML document.

[0706] The following are methods of outbuf:

[0707] acquire(int estimated_size, location)

[0708] It allocates the buffer using the estimated size as the initialsize. The location parameter specifies whether the buffer will be basedon main memory or a disk file.

[0709] append(string)

[0710] It appends the string into the buffer, and expands the buffer ifnecessary.

[0711] char *getContent( )

[0712] It retrieves the content of the buffer if it is in memory or thefilename if it is in a file.

[0713] release( )

[0714] It deallocates the buffer.

[0715] E.4.2 Pseudocode for Implementation

[0716] The following is a set of pseudocode for implementing a storedprocedure to generate an XML document from a single SQL query in anembodiment of the invention:

[0717] dxxGenXML(dadbuf, result_tabname, m, n, sqlstate, msgtext)

[0718] dadbuf—a memory buffer containing a document access definition

[0719] result_tabname—the name of the table into which the resulting XMLdocuments will be stored

[0720] m—the maximum number of documents/rows to return; 0 if none.

[0721] n—OUT: the actual number of documents/rows generated

[0722] sqlstate—OUT: the SQL state in case of errors

[0723] msgtext—OUT: message text in case of errors dxxGenXML(dadbuf,result_tabname, m, n, sqlstate, msgtext) { /* Parse the DAD and preparethe SQL query */ dad = the DOM tree of dadbuf; Perform the query andsave the result into cache table,  db2xml.dxxcache; Initialize levelmapby a heuristic technique.  top_columns = levelmap->getColumns(0); top_query = “select distinct top_columns from dxxcache”; PREPARE q INTO:sqlda FROM :top_query; DESCRIBE q INTO :sqlda DECLARE cur CURSOR FOR q;OPEN cur; for i = (1..n) {  FETCH cur USING DESCRIPTOR sqlda; if(SQLCODE == +100) /* no more data */   goto exit; outbuf->acquire(estimated_size);  genXMLDoc(sqlda, levelmap, dad,  &outbuf, rc, msgtext);  /* outbuf now contains an XML document */ INSERT INTO result_tabname VALUES outbuf;  outbuf->release(); } msgtext= “Result exceeds maximum.”  “Only the first n documents are returned.”;exit:  CLOSE cur;  Drop table dxxcache; } genXMLDoc(sqlda, levelmap,dad, outbuf, rc, msgtext) { /* Generate an XML document */ /* Retrievethe header information.*/ Get the prolog and doctype from dad by DOMAPI.  and write them to outbuf; /* Get the root element from DAD.*/ Getthe element node under the root node from dad by DOM API.genXMLElement(root_element_node, sqlda, 0, levelmap,  outbuf, rc,msgtext); } genXMLElement(node, sqlda, currentlevel, levelmap, outbuf,rc, msgtext) { Add the opening of the element start-tag to outbuf;  Getall attribute children of this element from dad by DOM API.  For eachattribute,   genXMLAttribute(attrnode, sqlda,    outbuf, rc, msgtext); Add the closing of the element start-tag to outbuf;  Get all otherchildren of this element from dad by DOM API.  For each child, do {   ifthe child is a text node {    genXMLText(node, sqlda, outbuf, rc,msgtext);   } else {    Get the level of the first column of this childby DOM    API.    if(childlevel < currentlevel)     handle the error:   if(childlevel == currentlevel) {     genXMLElement(childnode, sqlda,currentlevel,     levelmap, outbuf, rc, msgtext)    } else {   /* Openanother cursor to generate    possible multiple occurrences at    thislevel.*/   Get the columns of the child's level by  levelmap->getColumns(childlevel).   Construct a where-clause fromsqlda by equating    all pairs of column names and values.   query =“select distinct columns from dxxcache”    “wherewhere_clause_from_sqlda”;   PREPARE q INTO :childsqlda FROM :query;  DESCRIBE q INTO :childsqlda   DECLARE childcur CURSOR FOR q;   OPENchildcur;   while (1) {    FETCH childcur USING DESCRIPTOR childsqlda;   if(SQLCODE == +100) /* no more data */     goto done;   genXMLElement(childnode, childsqlda, childlevel,    levelmap,    &outbuf, rc, msgtext);    }   }  done:   CLOSE childcur;  }  Addelement end-tag to outbuf; }  genXMLAttribute(node, sqlda, outbuf, rc,msgtext) {  Get the value of the attribute from its column in sqlda. Append to outbuf: “name = \“value\””; }  genXMLText(node, sqlda,outbuf, rc, msgtext) {  Get the text of node from its column in sqlda. Append the text to outbuf. }

[0724] Given the approach taken for the formulation of a query, there isa problem of duplicated data in certain higher-level columns, such ascustomer_name in the example, to be tackled. A solution to this problemis to group or “aggregate” the columns that have one-to-one mapping,into an equivalence class. An advantage of this solution is that the XMLSystem does not need to parse the user's SQL query.

[0725] To eliminate the duplicates in higher-level columns, the resultof the SQL query is traversed at least once for each level. By saving,the result into a cache, such as a temporary table, executing the querymultiple times is avoided. The size of this cache table is usuallysmaller than some of the original user tables and no join is needed forquerying, the cache.

[0726] To return a result set, the stored procedure opens a cursor andleaves it open. The stored procedure still needs a table for the queryfor which the cursor is declared.

[0727] In DB2®, a result set is available only to client programs thatare written using Call Level Interface (CLI) and not using static SQL.On the other hand, any SQL client can ha e access to a result table.

[0728] E.4.3 Code Organization

[0729] As for code organization, the XML System code consists of astored procedure, dxxGenXML, some SQL C functions called by the storedprocedure, and a few C++ classes or C structs for defining the necessarydata structures. The data structures are defined as C structs because ofthe rules of DB2® Extenders. The module can be linked into the db2xmlDLL with other stored procedures. It interacts with an XML4C parserusing the single document interface functions: dxxInitializeParser anddxxDOM, which have already been implemented and used by enable column.

[0730]FIG. 7 is a diagram illustrating code organization to compose XMLdocuments. The dxxGenXML( ) stored procedure 700 and the dxxRetrieveXML() stored procedure 702 each invoke the dxxComposeXML( ) module 704.Decision block 706 determines whether RDB_node mapping is to beperformed. If RDB_node mapping is not to be performed, processingcontinues with SQL to XML module 708 and the SQL to XML mapping isperformed by the SQL to XML mapping technique 710. If RDB_node mapping)is to be performed, processing continues with the dxxGenXMLFrom RDB( )module 712 and RDB_node mapping is performed by the RDBR technique.

[0731] E.5 Components and Flow Diagram

[0732]FIG. 8 is a block diagram illustrating components of the XMLSystem in one embodiment of the invention. Relational tables 800 storerelational data. A Document Access Definition (DAD) 802 defines anXcollection 804 and a SQL query 806. A Document Type Definition (DTD)808 is used to validate and define the DAD 802. The SQL query is used toretrieve data from the relational tables 800. Using the DAD 802, the SQLquery 806, and the XML composition stored procedures 810, the XML systemgenerates one or more XML documents 812. The XML system stores the dataused to generate the one or more XML documents in an XML Collectiontable 814. Although the relational tables and XML Collection tables areshown in different data storage devices 800 and 814, both types oftables could reside at one data storage device.

[0733]FIG. 9 is a flow diagram illustrating the steps performed by theXML System to transform relational data into one or more XML documentsusing SQL mapping. In block 900, the XML System receives a DADcomprising an Xcollection definition. The Xcollection definitionincludes a SQL_query element, which is a valid SQL query. In block 902,the XML System parses the DAD and prepares the SQL_query. In block 904.the XML System retrieves data selected by the SQL_query and stores thedata in a cache table. In block 906, the XML System removes duplicatesusing XML composition stored procedures. In block 908, the XML Systemgenerates one or more XML documents using the SQL_query, the XMLcomposition stored procedures, and the DAD. In particular, the XMLSystem uses these components to map each column of the retrieved data toan XML element or attribute. Then, the XML System stores the data usedto generate the one or more XML documents in an XML Collection. Oneskilled in the art would recognize that the one or more XML documentscould be stored in another manner, for example, in other types of tablesor as a file.

F. GENERATING ONE OR MORE XML DOCUMENTS FROM A RELATIONAL DATABASE USINGTHE XPATH DATA MODEL

[0734] This invention presents a technique for generating one or moreXML documents from relational database tables using the XML PathLanguage (Xpath) data model. XPath models an XML document as a tree ofnodes, including element nodes, attribute nodes and text nodes. Thesenodes form a Document Object Model (DOM) tree.

[0735] In particular, the technique of the invention traverses aDocument Object Model (DOM) tree generated from an XML formatted DataAccess Definition (DAD), generates hierarchical SQL statements to querydata from relational tables, then generates one or more structured XMLdocuments. Using this invention, a user can directly map data in anexisting database to one or more XML documents, without requiring thetransformation from data in a relational database to data in anintermediate XML format.

[0736] This invention implements a stored procedure that takes a DataAccess Definition (DAD) and a name of a result table and returns aresult table that is populated with the one or more generated XMLdocuments. The DAD defines the mapping from the relational tables to theone or more generated XML documents. In a preparation stage, thetechnique traverses a DOM tree to gather information of each databasetable to be used in generating one or more XML documents. Then, thetechnique will generate SQL statements, query relational data, and writeXML document tree contents in a recursive manner. During the recursiveprocessing, the SQL statements are generated by using the previouslyprepared information and passing join values down from a higher levelSQL query to a lower level WHERE clause. The result of each query willbe taken as the XML attribute value and element text to be written tothe output XML documents.

[0737]FIG. 10 is a flow diagram illustrating the process performed bythe XML system using RDB_node mapping to compose XML documents. In block1000, the XML system generates a document object model (DOM) tree froman XML formatted data access definition (DAD). The tree comprisesrelational database nodes (i.e., element nodes, attribute nodes, andtext nodes). In block 1002, the XML system traverses the DOM tree togenerate SQL queries. In particular, the relational database nodesidentify relational tables and columns from which relational data is tobe retrieved, along with predicates and join relationships betweentables. In block 1004, the XML system executes SQL queries to retrieverelational data. In block 1006, the XML system maps the relational datato one or more XML documents using the DAD. The DAD defines a mappingbetween the relational data and one or more XML documents. Furthermore,the relational data maps to attribute values or element text of an XMLdocument.

[0738] F. 1 Example

[0739] The following is an example of generating an XML document from arelational database using an RDB_node (which defines the mapping betweenan XML element or attribute and relational data) in the DAD. Inparticular. a relational database is illustrated. Then, the results ofperforming SQL queries against the relational database are illustrated.Moreover, the Document Type Definition (DTD) and Document AccessDefinition (DAD) are provided. After this, one XML document that isgenerated to contain the data retrieved by the SQL query is illustrated.

[0740] Relational Database:

[0741] order_tab: order_key customer_name customer_email customer_phone1 General Motor parts@gm.com 800-GM-PARTS

[0742] part_tab: part_key color qty price tax order_key 156 red 1717954.55 0.02 1  68 black 36 34850.16 0.06 1 128 red 28 38000.00 0.07 1

[0743] ship_tab: date mode comment part_key 1998-03-13 TRUCK This is thefirst shipment to 156 service of GM. 1999-01-16 FEDEX This the secondshipment to 156 service of GM. 1998-08-19 BOAT This shipment isrequested  68 by a call from GM marketing. 1998-08-19 AIR This shipmentis ordered by  68 an email. 1998-12-30 TRUCK NULL 128

[0744] The following is an XML Document that is to be generated from theabove relational data: <?xml version=“1.0”?> <!DOCTYPE Order SYSTEM“E:\dxx\test\dtd\litem.dtd”> <Order key=“1”>  <Customer>   <Name>GeneralMotor</Name>   <Email>parts@gm.com</Email>  </Customer>  <Partcolor=“red”>   <key>68</key>   <Quantity>36</Quantity>  <ExtendedPrice>34850.16</ExtendedPrice>   <Tax>0.06</Tax>   <Shipment>   <ShipDate>1996-04-12<ShipDate>    <ShipMode>BOAT</ShipMode>  </Shipment>   <Shipment>    <ShipDate>1998-08-19</ShipDate>   <ShipMode>AIR</ShipMode>   </Shipment>  </Part>  <Part color=“red”>  <key>128</key>   <Quantity>28</Quantity>  <ExtendedPrice>38000.00</ExtendedPrice>   <Tax>0.07</Tax>   <Shipment>   <ShipDate>1998-12-30</ShipDate>    <ShipMode>TRUCK</ShipMode>  </Shipment>  </Part> </Order>

[0745] Assuming the following structure of an XML document will begenerated from the data selected by a SQL_stmt, how to use an XMLcollection to specify the DAD will be illustrated below. <?xmlversion=“1.0”?> <!DOCTYPE Order SYSTEM “E:\dtd\dxxdad.dtd”> <DAD> <dtdid>E:\dtd\lineItem.dtd</dtdid>  <validation>YES</validation> <Xcollection>  <prolog>?xml version=“1.0”?</prolog>  <doctype>!DOCTYPEOrder SYSTEM “E:\dtd\lineItem.dtd”</doctype>  <root_node>  <element_nodename=“Order”>   <RDB_node>   <table name=“order_tab”/>   <tablename=“part_tab”/>   <table name=“ship_tab”/>   <condition>   order_tab.order_key = part_tab.order_key AND    part_tab.part_key =ship_tab.part_key   </condition>   </RDB_node>   <attribute_nodename=“Key”>   <RDB_node>    <table name=“order_tab”/>    <columnname=“order_key”/>   </RDB_node>   </attribute_node>   <element_nodename=“Customer”>   <element_node name=“Name”>    <text_node>    <RDB_node>     <table name=“order_tab”/>     <columnname=“customer_name”>     </RDB_node>    </text_node>   </element_node>  <element_node name=“Email”>    <text_node>     <RDB_node>     <tablename=“order_tab”/>     <column name=“customer_email”/>     </RDB_node>   </text_node>   </element_node>  </element_node>  <element_nodename=“Part”>   <attribute_node name=“Key”>    <RDB_node>    <tablename=“part_tab”>    <column name=“part_key”>    </RDB_node>  </attribute-node>   <element_node name=“ExtendedPrice”>    <text_node>   <RDB_node>     <table name=“part_tab”/>     <column name=“price”/>    <condition>     price &amp;gt; 2500.00     </condition>   </RDB_node>    </text_node>   </element_node>   <element_nodename=“Tax”>    <text_node>    <RDB_node>     <table name=“part_tab”>    <column name=“tax”/>    </RDB_node>    </text_node>   <element_nodename=“Part”>    <attribute_node name=“Key”>    <RDB_node>     <tablename=“part_tab”/>     <column name=“part_key”/>    </RDB_node>   </attribute_node>    <element_node name=“Quantity”>    <text_node>   <RDB_node>     <table name=“part_tab”/>     <column name=“qty”/>   </RDB_node>    </text_node>    </element_node>   </element_node>  <element_node name=“shipment”>    <element_node name=“ShipDate”>   <text_node>    <RDB_node>     <table name=“ship_tab”/>     <columnname=“date”/>     <condition>     date &amp;gt; “1966-01-01”    </condition>    </RDB_node>    </text_node>    </element_node>   <element_node name=“ShipMode”>    <text_node>    <RDB_node>    <table name=“ship_tab”/>     <column name=“mode”/>    </RDB_node>   </text_node>    </element_node>    <element_node name=“Comment”>   <text_node>    <RDB_node>     <table name=“ship_tab”/>     <columnname=“comment”/>    </RDB_node>    </text_node>    </element_node>  </element_node> <!-- end of element Shipment>   </element_node> <!--end of element Part --->  </element_node> <!-- end of element Order ---></root_node> </Xcollection> </DAD>

[0746] Assuming the XML documents need to be composed or decomposed arelike the one shown in the example in Section F.1, the following sampleDAD shows how to define the mapping from relational tables usingRDB_node Mapping. In particular, the following example DAD shows how tocompose/decompose a set of XML documents from/into three relationaltables while using the RDB_node to specify the mapping. Litem_DAD3.dad<?xml version=“1.0”?> <!DOCTYPE Order SYSTEM “E:\dtd\dad.dtd”> <DAD><dtdid>E:\dtd\lineItem.dtd</dtdid> <validation>YES</validation><Xcollection> <prolog>?xml version=“1.0”?</prolog> <doctype>!DOCTYPEOrder SYSTEM “E:\dtd\lineItem.dtd”</doctype> <root_node> <element_nodename=“Order”> <RDB_node> <table name=“order_tab” key=“order_key”/><table name=“part_tab” key=“part_key”/> <table name=“ship_tab”key=“date”/> <condition> order_tab.order_key = part_tab.order_key ANDpart_tab.part_key = ship_tab.part_key </condition> </RDB_node><attribute_node name=“Key”> <RDB_node> <table name=“order_tab”/> <columnname=“order_key” type=“integer”/> </RDB_node> </attribute_node><element_node name=“Customer”> <text_node> <RDB_node> <tablename=“order_tab”/> <column name=“customer” type=“char(128)”/></RDB_node> </text_node> </element_node> <element_node name=“Part”><attribute_node name=“Key”> <RDB_node> <table name=“part_tab”/> <columnname=“part_key” type=“integer”/> </RDB_node> </attribute_node><element_node name=“Quantity”> <text_node> <RDB_node> <tablename=“part_tab”/> <column name=“qty” type=“integer”/> </RDB_node></text_node> </element_node> <element_node name=“ExtendedPrice”><text_node> <RDB_node> <table name=“part_tab”/> <column name=“price”type=“real”/> <condition> price &amp;gt; 2500.00 </condition></RDB_node> </text_node> </element_node> <element_node name=“Tax”><text_node> <RDB_node> <table name=“part_tab”/> <column name=“tax”type=“real”/> </RDB_node> </text_node> </element_node> <element_nodename=“shipment”> <element_node name=“ShipDate”> <text_node> <RDB_node><table name=“ship_tab”/> <column name=“date” type=“date”/> <condition>date &amp;gt; “1966-01-01” </condition> </RDB_node> </text_node></element_node> <element_node name=“ShipMode”> <text_node> <RDB_node><table name=“ship_tab”/> <column name=“mode” type=char(120)”/></RDB_node> </text_node> </element_node> <element_node name=“Comment”><text_node> <RDB_node> <table name=“ship_tab”/> <column name=“comment”type=“varchar(2k)”/> </RDB_node> </text_node> </element_node></element_node> <! -- end of element Shipment> </element_node> <! -- endof element Part ---> </element_node> <! -- end of element Order ---></root_node> </Xcollection> </DAD>

[0747] F.2 How to Use an XML Collection

[0748] An XML collection is a set of relational tables which contain XMLdata. These tables can be new tables generated by the XML System whendecomposing XML documents or existing tables which have data to be usedby the XML System to generate XML documents. Stored procedures providedby the XML System serve as the access methods. Unlike the XML column, anXML collection does not have to be enabled. The enablement is based onthe operations performed.

[0749] A composition operation of an XML collection generates one ormore XML documents from data existing in the collection tables.Therefore, for this operation, an XML collection does not need to beenabled, providing all tables already exist in the database. The DADwill be passed to a stored procedure. The DAD can be overridden by otherXML query parameters as the stored procedure input parameters. This kindof parameter can be obtained from the Web dynamically.

[0750] In the DAD preparation, “Xcollection” is defined first. AnXcollection can be defined for composition with RDB_node mapping. Thefollowing steps apply:

[0751] Specify the DTDID. The DTDID must be the same as the system ID inthe doctype.

[0752] Specify the validation. When using DTDID, the validation shouldbe “YES”.

[0753] Specify the prolog and doctype. For composition, the same prologand doctype are allowed.

[0754] Specify RDB_node mapping, which will be described in Section F.5below.

[0755] When using the RDB_node mapping, the RDB_node should be definedfor a root element_node and each text_node and attribute_node. TheRDB_node defines the table and column in the relational database whichis to be mapped to an XML element or attribute.

[0756] The following illustrates the mapping with RDB_nodes by thesample DAD Litem_DAD3.dad. This basically describes how to specify theRDB_node.

[0757] Define the tables and the predicate to join these tables for theroot element_node (the element_node under the root_node). All tableswhich contribute data for the root element should be included and thecondition to join these tables should be stated. The primary and foreignkey relationship for join is strongly recommended, but not required. Inthe example, the root element_node Order's RDB_node has three tables:order_tab, part_tab and ship_tab as shown below.

[0758] For each attribute_node or text_node, the table and column whichcontains its data are specified. If a user has a select condition thepredicate in the condition of the RDB_node is specified. In the example,for text_node “ExtendedPrice”, the table is specified as part tab,column as price, and the condition as “price>2500.00”.

[0759] In the RDB_node mapping, the XML System will traverse thedocument tree structure to generate the XML documents. <element_nodename=“Order”>  <RDB_node>  <table name=“order_tab” key=“order_key”/> <table name=“part_tab” key=“part_key”/>  <table name=“ship_tab”key=“date” orderBy=“date”/>  <condition>   order_tab.order_key =part_tab.order_key AND   part_tab.part_key = ship_tab.part_key </condition>  </RDB_node> . . .

[0760] F.2.1 Enabling an XML Collection

[0761] The purpose of enabling an XML Collection for decomposition is toparse a DAD, create new tables or check the mapping against existingtables. The DAD is stored into the XML_USAGE table when the XMLCollection is enabled.

[0762] When a user prefers to have the XML System create collectiontables, the user should enable the XML collection. Additionally, theenablement depends on the stored procedure the user chooses to use. Thestored procedure dxxInsertXML( ) will take XML Collection name as theinput parameter. In order to use the stored procedure dxxInsertXML( ),the user must enable an XML collection before calling it. The user cancall stored procedure dxxShredXML( ) without the enabling of an XMLcollection by passing a DAD. In the later case, all tables specifiedmust exist in the database.

[0763] F.2.1.1 Enabling XML Collection Option

[0764] For composition, an XML collection is not required to be enabled.The assumption is that all collection tables already exist in thedatabase. The stored procedure can take a DAD as an input parameter andgenerate XML documents based on the DAD. On the other hand, thecomposition is the opposite of the decomposition. For XML collectionsenabled during the decomposition process, the DAD is likely to be usedto compose XML documents again. If the same DAD is used, then thecollection can be enabled for both composition and decomposition.

[0765] An XML Collection can be enabled through the XML Systemadministration GUI (graphical user interface) or using the dxxadmcommand with the Enable_collection option. The syntax of the option on aDB2 server is as follows:

[0766] dxxadm enable_collection db_name collection_name DAD_file [-ttablespace]

[0767] where

[0768] a. db_name: name of the database.

[0769] b. collection_name: name of the XML collection, which will beused as the parameter to the stored procedures.

[0770] c. DAD_file: Data Access Definition(DAD) file.

[0771] d. tablespace: Optional. The tablespace contains tables in theXML collection. If new tables need to be created for decomposition, thennew tables will be created in the specified tablespace.

[0772] The following is an example of enabling the XML collection calledsales_order in database mydb with the DAD_file Litem_DAD3.dad.

[0773] /home/u1>dxxadm enable_collection mydb sales_order Litem_DAD3.dad

[0774] DXXA009I XML System is enabling collection sales_order. Pleasewait.

[0775] DXXA010I XML System has successfully enabled the collectionsales_order.

[0776] /home/u1>

[0777] The enable_collection option mainly does the following things toa database:

[0778] e. Read the DAD_file, call XML parser to parse DAD, and saveinternal information for mapping.

[0779] f. Store internal information into the XML_USAGE table.

[0780] The option is good for performance and is usually helpful toperform composition and decomposition using one DAD.

[0781] F.2.1.2 Enable collection Option

[0782] The enable_collection option enables an XML collection associatedwith an application table. The association between the application tableand the side table specified by the DAD is through the root_id.

[0783] Syntax

[0784] dxxadm enable_collection db_name collection DAD_File

[0785] [-t tablespace] [-l login] [-p password]

[0786] Argument

[0787] g. db_name: the database name.

[0788] h. collection: the name of an XML collection.

[0789] l. DAD_File: the file containing the DAD.

[0790] j. tablespace: Optional. The tablespace containing a user tablespecified in the DAD or side table created by the XML System.

[0791] k. login: Optional. The user ID, which is only needed if thecommand is invoked from a DB2 client.

[0792] l. password: Optional. The password, which is only needed if thecommand is invoked from a DB2 client.

[0793] The enable_collection option will enable an XML collection. Theenablement process is to parse the DAD and prepare tables for XMLcollection access. It takes the database name, a name of the XMLcollection, a DAD_File and an optional tablespace. The XML collectionwill be enabled based on the DAD in the DAD_File. It checks whether thetables specified in the DAD exist. If the tables do not exist, the XMLSystem will create the tables according to the specification in the DAD.The column name and data type is taken from the RDB node of an attributenode or text_node. If the tables exist, the XML System will checkwhether the columns were specified with the right name and data types inthe corresponding tables. If a mismatch is found, an error will bereturned. The tablespace is optional, but it is specified if thecollection tables are to be created in a tablespace other than thedefault tablespace of the specified database.

[0794] The enable_collection is required for decomposition storedprocedure dxxInsertXML( ), and its pairing dxxRetrieveXML( ), and thedxxUpdateXML( ). For stored procedure dxxGenXML( ) and dxxShredXML( )which take a DAD as input, the enablement of an XML collection is notrequired. For the latter stored procedures, it is assumed that alltables specified in the DAD exist in the database already. If they don'texist, an error will be returned. The enable_collection does have apairing disable_collection option. But the operation ofdisable_collection is much simpler. It just deletes the collection fromXML_USAGE table.

[0795] As discussed in Section E, FIG. 7 is a diagram illustrating codeorganization to compose XML documents. The dxxGenXML( ) stored procedure700 and the dxxRetrieveXML( ) stored procedure 702 each invoke thedxxComposeXML( ) module 704. Decision block 706 determines whetherRDB_node mapping is to be performed. If RDB_node mapping is not to beperformed, processing continues with SQL to XML module 708 and the SQLto XML mapping is performed by the SQL to XML mapping technique 710. IfRDB_node mapping is to be performed, processing continues with thedxxGenXMLFrom RDB( ) module 712 and RDB_node mapping is performed by theRDBR technique.

[0796] F.3 Using RDB Node Mapping Scheme

[0797] The mapping between composed/decomposed XML documents and an XMLcollection is specified in the Xcollection of a DAD. The XML Systemadapts the notation used in XSLT and uses a subset of it to define theXML document structure. In order to facilitate the mapping, the XMLSystem introduces the element Relational DataBase node (RDB_node) to theXcollection.

[0798] The DAD defines the XML document tree structure using seven kindsof nodes defined by XSLT/XPath:

[0799] root_node

[0800] element_(—node)

[0801] text_node

[0802] attribute_node

[0803] namespace_node

[0804] processing_instruction_node

[0805] comment_node

[0806] For simple and complex compositions, the RDB_Node is used todefine where the content of an XML element or value of an XML attributeis to be stored or retrieved.

[0807] The RDB_Node has the following components:

[0808] Table: the name of the relational table or updateable view, inwhich the XML element content or attribute value is to be stored.

[0809] Column: the name of the column which contains the element contentor attribute value.

[0810] Condition: the predicates in the WHERE clause to select thedesired column data.

[0811] In the definition of an Xcollection, for this embodiment of theinvention the following approach is used to define the mapping:

[0812] RDB_node Mapping: Specify RDB node for each text_node andattribute_node, and the root element_node in the XML data model. In thiscase, the XML System will generate SQL statements based on the RDB_nodesand document tree structure.

[0813] The text_node and attribute_node will have a one-to-one mappingto/from a column in a relational table. Therefore, each of them willhave a RDB_node to define the mapping, where the RDB_node is needed forthe RDB_node mapping. It is possible that an element_node has notext_node but only child element_node(s).

[0814] Using a RDB_node to specify each text_node and attribute_node ismore general. Only the root element_node needs to have a RDB_node. Inthis RDB_node, the user is required to specify all tables used tocompose/decompose data, as well as a join condition among these tables.The condition predicate in this RDB_node will be pushed down from theroot element_node to all child nodes. Ideally, the way to tie all tablestogether within an XML collection is the primary-foreign keyrelationship. However, it often happens that some existing user tablesdo not have such a relationship. Therefore, requiring the foreign keyrelationship for composition is too restrictive. However, in the case ofdecomposition, if new tables are created for storing the decomposed XMLdata, then the DAD requires a user to specify the primary key of eachtable, as well as the primary-foreign key relationship among tables.

[0815] F.4 Detailed Techniques

[0816] The following discussion focuses on the technique for oneembodiment of the invention. There are two major phases of thetechnique: the preparation phase and the generating phase.

[0817] F.4.1 Preparation Phase

[0818] In this phase, the relational structure is generated byprocessing the RDB_node of the root element_node. It is required that inthis RDB_node, all tables contribute data to the XML document to belisted, as well as the join conditions between these tables.

[0819] In this phase, a mapping is added between a relational column andan XML attribute value or element text to the relational tablestructure, so that the technique tracks where the relational data isfrom.

[0820] F.4.2 Generating Phase

[0821] From the root element_node, the technique traverses the DAD DOMTree, using the relational information recorded in a REL data structureprepared in the first phase to generate a SQL statement. Then the dataselected is used to fulfill the XML attribute value or text of anelement.

[0822] F.4.3 Data Structures

[0823] The following data structures are used by the invention:

[0824] The following data structures are used for decomposition of XMLdocuments using RDB-nodes. /*----------------------------------- *Column to XML node pair *----------------------------------*/ typedefstruct col2xml { char col[DB2_TAB_COL_VIEW_LEN]; chardataType[DB2_TAB_COL_VIEW_LEN+8];/* data type of the column */ charxml[DXX_XML_FIELD_SIZE]; /* xml can be attribute or element name */ intxmlType; /* DXX_ATTRIBUTE or DXX_TEXT */ int xmlLevel; /* fordetermining if a column  needs to be copies to the  next row */ intindexInSQLDA; } DXX_COL2XML; /*----------------------------------- *Table and Column pair *----------------------------------*/ typedefstruct tabcol { char tab[DB2_TAB_COL_VIEW_LEN]; charcol[DB2_TAB_COL_VIEW_LEN]; int indexInSQLDA; } DXX_TAB_COL;/*----------------------------------- * Join information*----------------------------------*/ typedef struct join_info { charcol[DB2_TAB_COL_VIEW_LEN];  /* column used in the join condition */ intnum_join; /* number of foreign columns to be joined */ DXX_TAB_COLforeign[DXX_NUM_FOREIGN];  /* foreign column and its table to be joined*/ } DXX_JOIN_INFO; /*----------------------------------- * Primary keyInformation: *----------------------------------*/ typedef structpri_key { int num_col; /* number of columns in the key */ charname[DXX_NUM_MAPPING][DB2_TAB_COL_VIEW_LEN];  /* the column names of theprimary key */ } DXX_PRIKEY; /*----------------------------------- *Foreign key Information: *----------------------------------*/ typedefstruct for_key { int num_col; /* number of columns in the key */DXX_COL2XML col[DXX_NUM_MAPPING];  /* the columns of the foreign key */} DXX_FORKEY; /*----------------------------------- * Table Information:*----------------------------------*/ typedef struct tab { charname[DB2_TAB_COL_VIEW_LEN]; /* name of the table */ DXX_PRIKEY pri_key;/* primary key */ DXX_FORKEY for_key; /* foreign key */ int level; /*relational level of the table */ char top_element[DXX_XML_FIELD_SIZE]; /* the highest level of XML element using  * column data of this table*/ char sql_stmt[MAX_STMT_LEN]; /* SQL statement */ SQLHSTMT hstmt; /*CLI statement handle */ struct sqlda *sqldaPtr; /* pointer of selecteddata */ int num_col; /* number of columns whic data used in the  * XMLdocument */ DXX_COL2XML col2xml[DXX_NUM_MAPPING];  /* mapping betweencolumn in this table to  * XML attribute or element text */ intnum_join; /* number of columns in this table which  * will form joinconditions */ DXX_JOIN_INFO join_info[DXX_NUM_JOIN_IN_TABLE]; /* joininformation * char condition[DXX_CONDITION_IN_REL]; /* condition ofselect for columns in this * table */ } DXX_TAB;/*----------------------------------- * Relation Information for entireXML documents *----------------------------------*/ typedef struct rel {int current_level; /* the current XML level during treversal */ intnum_tables; /* number tables to generate/decompose XML doc*/ DXX_TAB*tab[DXX_NUMTAB_IN_REL]; /* details of each table */ char**top_elements; /* index to tab[i]->top_element for fast search. */SQLHDBC hdbc; /* CLI connection handle */ } DXX_REL;/*----------------------------------- * Data structure to store a row*----------------------------------*/ def struct row { int num_col; char**coldata; /* Array of pointers */ } DXX_ROW;/*----------------------------------- * Data structure to store the rows*----------------------------------*/ typedef struct rows { intnum_rows; DXX_ROW *row[DXX_MAX_ROWS]; } DXX_ROWS;

[0825] dxxGenXMLFromRDB(hdbc,dad, result_tabname,overrideType,overridem, n,) dxxGenXMLFromRDB(hdbc,dad, result_tabname,overrideType,overridem, n.) {  working variable:  char[] heading_buf;   DOMtree dad;  DOMNodetop_element_node;  DXX_REL relation_info;  DXX_REL *rel =&relation_info;  DXX_OVERRIDE dxx_override;  DXX_OVERRIDE_COND*dxx_override_ptr =  (DXX_OVERRIDE_COND*) &dxx_override;  /* initializeoverride data sturcture */ dxxInitOverride(overrideType.override.dxx_override);  /* set relationalstructure */  setRel(rel,dad);  root_element_node = get firstelement_node from dad:  /* process relational information */ process_rel(rel.root_element_node,dxx_override);  /* Retrieve theheader information. */  Get the prolog and doctype from dad by DOM API, and write them to heading_buf;  /* process element_node and generateXML */ process_root_element(hdbc,rel,root_element_node,result_tabname. heading_buf); }

[0826] Initialize Override Data Structure

[0827] This is used for XML_OVERRIDE. Parse the input override parameteraccording to the overrideType, then break the conditions into an arraystructure, where each entry has a path and predicate.

[0828] dxxInitOverride(overrideType,override,dxx_override)dxxInitOverride(overrideType,override,dxx_override) { CheckoverrideType; if (overrideType ==XML_OVERRIDE) { loop through inputoverride string { parse override path expression and fill intodxx_override array entry: store path in dxx_override entry's path: storepredicate in dxx_override entry's predicate; } }

[0829] Setup Relational Structure

[0830] This routine will process the RDB_node of the top element codeand initialize the REL structure of entire XML documents. After theprocess, all tables involved in composing/decomposing XML. Documentsshould be included in the REL structure, and relationship between tablesshould also be recorded in the REL. setReI(rel,dad dxx_type) { workingvariable: DXX_TAB *current_tab; get the RDB_node of thetop_element_node; if no RBD_node return error; for each table inRDB_node do { allocate space for current_tab; current_tab->name =RDB_node's table name: if (dxx_type ==DXX_DECOMPOSE)current_tab->primary_key_name = RDB_node's table key;current_tab->top_element = NULL; current_tab->level = 0;current_tab->num_col 0; current_tab->selectDaPtr = NULL;current_tab->condition = NULL; if (RDB_node's condition != NULL) scanthe condition of RDB_node, current_tab->num_join = 0; for eachcurrent_tab.column in the predicate do{ i = current_tab->num_join;current_tab->join_info[i].col ==curent_tab.column;current_tab->join_info[i].num_join = 0; for each other_tab.col =current_tab.column do{ j = current_tab->join_info[i].num_join;current_tab->join_info[i].foreign[j].col = col in other_tabcurrent_tab->join_info[i].foreign[j].tab = other_tabcurrent_tab->join_info[i].foreign[j].indexInSQLDA = DXX_NONEcurrent_tab->join_info[i].num_join+ +; } current_tab->num_join+ +; }rel->tab[i]= current_tab; rel->num_tables+ +; } Example:rel->tab[0].name = “order_tab”; rel->tab[0].primary_key_name =“order-key”; rel->tab[0].level = 0; rel->tab[0].top_element = NULL;rel->tab[0].selectDaPtr = NULL; rel->tab[0].num_col = 0;rel->tab[0].num_join = 1; rel->tab[0]->join_info[0]->col = “order_key”;rel->tab[0]->join_info[0]->num_join = 1;rel->tab[0]->join_info[0]->foreign[0].col = “o_key”;rel->tab[0]->join_info[0]->foreign[0].tab = “part_tab”;rel->tab[0]->join_info[0]->foreign[0].indexInSQLDA = DXX_NONE;rel->tab[0]->condition = NULL; rel->tab[1].name = “part_tab”;rel->tab[1].primary_key_name = “part_key”; rel->tab[1].level = 0;rel->tab[1].top_element = NULL; rel->tab[1].selectDaPtr = NULL;rel->tab[1].num_col = 0; rel->tab[1].num_join 2;rel->tab[1]->join_info[0]->col = “o_key”rel->tab[1]->join_info[0]->num_join = 1;rel->tab[1]->join_info[0]->foreign[0].col = “order_key”;rel->tab[1]->join_info[0]->foreign[0].tab = “order_tab”;rel->tab[1]->join_info[0]->foreign[0].indexInSQLDA = DXX_NONE;rel->tab[1]->join_info[1]->col = “part_key”;rel->tab[1]->join_info[1]->num_join = 1;rel->tab[1]->join_info[1]->foreign[0].col = “p_key”;rel->tab[1]->join_info[1]->foreign[0].tab = “ship_tab”;rel->tab[1]->join_info[1]->foreign[0].indexlnSQLDA = DXX_NONE;rel->tab[1]->condition = NULL; rel->tab[2].name = “ship_tab”;rel->tab[2].primary_key_name = “ship date”; rel->tab[2].top_element =NULL; rel->tab[2].level = 0; rel->tab[2].selectDaPtr = NULL;rel->tab[2].num_col = 0; rel->tab[2].num_join = 1;rel->tab[2]->join_info[0]->col = “p_key”;rel->tab[2]->join_info[0]->num_join = 1;rel->tab[2]->join_info[0]->forign[0].col = “part_key”rel->tab[2]->join_info[0]->forign[0].tab = “part_tab”:rel->tab[2]->join_info[0]->foreign[0].indexInSQLDA = DXX_NONE;rel->tab[2]->condition = NULL; rel->num_tables = 3;

[0831] Process Relational Information

[0832] This is the second phase of the preparation process together allinformation to generate SQL statements. It recursively processes eachRDB_node for each attribute, text, and element node of DAD, and recordsthe mapping relationship into the REL data structure. process_rel(rel,dom_node,current_level, dxx_type) { /* The first pass fills ineverything in rel, except foreign_key. */ process_rel1(rel,dom_node,currenr_level, dxx_type): if (dxx_type == DXX_DECOMPOSE) { /*The second pass fills in the foreign_key of each table in rel and updatenum_col. */ process_rel2(rel); } findTopElements(rel dom_node.dxx_type); Create a hash table or sorted array of all top_elementsrel->top_elements for fast search. } process_rel1(rel, dom_node,dxx_type) { if (dom_node type == element_node) { rel->current_level + +;child = DOMGetFirstChild; while ( child != NULL) {process_rel1(rel,child,dxx_type); child = DOMGetNextchild; } { if (type==attribute node || type ==text_node || (dom_node is a leaf element node&& DXX_DECOMPOSE)) { call DOMgetChild to get the RDB_node of dom_node;if (there is no RDB_node) return error; callprocess_RDB_node(rel,dom_node, RDB_node.dxx_type); } if (dom_node typeelement_node) rel->current_Level--; } /* Fill in the foreign_key of eachtable in rel. */ process_rel2 (rel) { /* Working variables: */ DXX_TAB*foreignTab; for (tabIndex = 0; tabIndex < rel->num_tables; tabIndex+ +){ current_tab = rel->tab[tabIndex]; for (i = 0; i <current_tab->num_join; i+—+) for (j =0; 1<current_tab>join_info[i].num_join; j+ +) { foreignTab =findTabByName(rel, current_tab->join_info[i].foreign[j].tab); if(foreignTab->level < current_tab->level) {strcpy(current_tab->foreign_key.col,current_tab->join_info[i].foreign[j].col); Find k such thatforeignTab.->col2xml[k].col equals current_tab~>foreion_kev.col;current_tab->foreign_key.xml = foreignTab->col2xml [k].xml;current_tab->foreign_key.xmlType = foreignTab ->col2xml[k].xmlType;current_tab->foreign_key.dataType = foreignTab->col2xml[k].dataType: } }} } } findTopElements(rel, dom_node, dxx_type) { /* findTopElementsfinds the top-element for each table in rel. */ /* findTopElementsassumes that process_rel1 has been called. */ if (dom_node type==element node) { rel->current_level+ + child = DOMGetFirstChild; while(child != NULL) { findTopElements(rel,child,dxx_type); child =DOMGetNextchild; } } if (type ==attribute_node || type =text_node(dom_node is a leaf element node && DXX_DECOMPOSE)) { call DOMgetChildto get the RDB_node of dom_node: if (there is no RDB_node) return error;call findTopElement(rel.dom_node.RDB_node.dxx_type): } if (dom_node type==element node) rel->current_level--; } Example: rel->tab[0].name =“order_tab”; rel->tab[0].primary key name = “order_key”;rel->tab[0].foreign_key.col = “”; rel->tab[0].top_element = “Order”:rel->tab[0].selectDaPtr = NULL; rel->tab[0].num_col = 3;rel->tab[0]->col2xml[0].xml= “Key”;rel->tab[0]->col2xml[0].col=“order_key”;rel->tab[0]->col2xml[0].dataType=“integer”;rel->tab[0]->col2xml[0].xmlType=DXX_ATTRIBUTE:rel->tab[0]->col2xml[0].xmlLevel=0; rel->tab[0]->col2xml[1].xml= “name”;rel->tab[0]->col2xml[1].col=“customer_name”;rel->tab[0]->col2xml[1].dataType=“char(128)”;rel->tab[0]->col2xml[1].xmlType=DXX_TEXT;rel->tab[0]->col2xml[1].xmlLevel=2; rel->tab[0]->col2xml[2].xml=“email”; rel->tab[0]->col2xml[2].col=“customer_email”;rel->tab[0]->col2xml[2].dataType=“char(128)”;rel->tab[0]->col2xml[2].xmlType=DXX_TEXT;rel->tab[0]->col2xml[2].xmlLevel=2; rel->tab[0].num_join = 1;rel->tab[0]->join_info[0]->col = “order_key”;rel->tab[0]->join_info[0]->num_join = 1rel->tab[0]->join_info[0]->foreign[0].col = “o_key”rel->tab[0]->join_info[0]->foreign[0].tab = “part_tab”;rel->tab[0]->join_info[0]->foreign[0].indexInSQLDA = DXX_NONErel->tab[0]->condition = NULL; rel->tab[1].name = “part tab”;rel->tab[1].primary_key_name = “part_key”; rel->tab[1].foreign_key.col =“order_key”; rel->tab[1].foreign_key_dataType = “integer”;rel->tab[1].foreign_key.xml = “Key”; rel->tab[1].foreign_key.xmlType =DXX_ATTRIBUTE; rel->tab[1].top_element = “Part”; rel->tab[1].selectDaPtr= NULL; rel->tab[1]num_col = 5; rel->tab[1]->col2xml[0].xml = “Color”;rel->tab[1]->col2xml[0].col = “Color”;rel->tab[1]->col2xml[0].dataType=“char(5)”;rel->tab[1]->col2xml[0].xmlType=DXX_ATTRIBUTE;rel->tab[1]->col2xml[0].xmlLevel=1; rel->tab[1]->col2xml[1].xml = “Key”;rel->tab[1]->col2xml[1].col= ‘part_key”;rel->tab[1]->col2xml[1].dataType=“integer”;rel->tab[1]->col2xml[1].xmlType=DXX_TEXT;rel->tab[1]->col2xml[1].xmlLevel=2; rel->tab[1]->col2xml[2].xm1=“Quantity”; rel->tab[1]->col2xml[2].col=“qty”;rel->tab[1]->col2xml[2].dataType=“integer”;rel->tab[1]->col2xml[2].xmlType=DXX_TEXT;rel->tab[1]->col2xml[2].xmlLevel=2; rel->tab[1]->col2xml[3].xml=“ExtendedPrice”; rel->tab[1]->col2xml[3].col=“price”;rel->tab[1]->col2xml[3].dataType=“real”;rel->tab[1]->col2xml[3].xmlType=DXX_TEXT;rel->tab[1]->col2xml[3].xmlLevel=2 rel->tab[1]->col2xml[4].xml= “Tax”;rel->tab[1]->col2xml[4].col=“Tax”;rel->tab[1]->col2xml[4].dataType=“real”;rel->tab[1]->col2xml[4].xmlType=DXX_TEXT;rel->tab[1]->coI2xml[4].xmlLevel=2; rel->tab[1].num_join = 2;rel->tab[1]->join_info[0]->col = “o_key”;rel->tab[1]->join_info[0]->type = “integer”;rel->tab[1]->join_info[0]->num_join = 1;rel->tab[1]->join_info[0]->foreign[0].col = “order_key”rel->tab[1]->join_info[0]->foreign[0].tab = “order_tab”;rel->tab[1]->join_info[0]->foreign[0].indexInSQLDA = DXX_NONErel->tab[1]->join_info[1]->col = “part_key”rel->tab[1]->join_info[1]->type = “integer”;rel->tab[1]->join_info[1]->num_join = 1;rel->tab[1]->join_info[1]->foreign[0].col = “p_key”;rel->tab[1]->join_info[1]->foreign[0].tab = “ship_tab”;rel->tab[1]->join_info[1]->foreign[0].indexInSQLDA = DXX_NONE:rel->tab[1]->condition = “price > 2500.00”; rel->tab[2].name =“ship_tab”; rel->tab[2].primary_key_name = “ship_date”;rel->tab[2].foreign_key.col = “part_key”;rel->tab[2].foreign_key.dataType = “integer”;rel->tab[2].foreign_key.xml = “Key”; rel->tab[2].foreign_key.xmlType =DXX_TEXT; rel->tab[2].top_element = “Shipment”; rel->tab[2].selectDaPtr= NULL; rel->tab[2].num_col = 2; rel->tab[2]->col2xml[0].xml =“ShipDate”; rel->tab[2]->col2xml[0].col=“date”;rel->tab[2]->col2xml[0].dataType=“date”;rel->tab[2]->col2xml[0].xmlType=DXX_TEXT;rel->tab[2]->col2xml[0].xmlLevel=3; rel->tab[2]->col2xml[1].xml=“ShipMode”; rel->tab[2]->col2xml[1].col = “mode”;rel->tab[2]->col2xml[1].dataType=“char(128)”;rel->tab[2]->col2xml[1].xmlType=DXX_TEXT;rel->tab[2]->col2xml[1].xmlLevel=3; rel->tab[2].num_join = 1;rel->tab[2]->join_info[0]->col = “p_key” rel->tab[2]->join_info[0]->type= “integer”; rel->tab[2]->join_info[0]->num_join = 1;rel->tab[2]->join_info[0]->forign[0].col = “part_key”;rel->tab[2]->join_info[0]->forign[0].tab = “part_tab”;rel->tab[2]->join_info[0]->foreign[0].indexInSQLDA = DXX_NONE;rel->tab[2]->condition = “date > ‘1966-01-01’ ”; rel->num_tables = 3;process RDB_node(rel,dom_node, rdb_node, dxx_type) { working variables:DXX_TAB    *mytab; int      i; /* It is attribute_node, text_node, or aLeaf element_node in DAD. */ /* Note: table must be specified inRDB_node * mytab = rel->tab[?] such that mytab.name = table of RDB_nodeif no mytab found return error; 1 mytab->num_col;mytab->col2xml[i].col=column of RDB_node; mytab->col2xml[i].xmlLevel =current_level; if (dxx_type ==DXX_DECOMPOSE) {mytab->col2xml[i].dataType= type of RDB_node; } if (it is anattribute_node) { mytab->col2xml[i].xml=name of the attribute if(dxx_type == DXX_DECOMPOSE) mytab->col2xml[i].xmlType DXX_ATTRIBUTE; }else if (it is a text_node) { mytab->col2xml[i].xml=name of the parentelement_node; if (dxx_type = DXX_DECOMPOSE) mytab->col2xml[i].xmlType =DXX_TEXT; } else { /* It is a leaf element_node in DAD. */mytab->col2xml[i].xml=name of element_node; if (dxx_type ==DXX_DECOMPOSE) mytab->col2xml[i].xmlType = DXX_ELEMENT; }mytab->num_col+ +; if (condition != NULL) add condition tomytab->condition; }

[0833] Definition of a Qualifying Parent:

[0834] The parent p of an element qualifies for a table t if all of thefollowing four conditions are met;

[0835] 1.p exists.

[0836] 2.p is intermediate.

[0837] 3.p has multi_occurrence==YES

[0838] 4.p does not have any child mapped to a table other than t.

[0839] Intuitively, if a parent qualifies for a table, it is a candidateto be chosen as the top-element of the table.

[0840] Definition of Top-Element:

[0841] Let e be the highest element in DAD that belongs to a table t.

[0842] If e does not have a parent qualifying for t, then e is thetop-element of t.

[0843] Otherwise, the top-element of t is the highest element in theparent chain of e that does not have a parent qualifying for t.

[0844] findTopElement(rel,dom_node, rdb_node, dxx_tape)findTopElement(rel,dom_node, rdb_node, dxx_type) { working variables:DXX_TAB *mytab; int     i; /* It is attribute_node, text_node, or a leafelement_node in DAD. */ /* Note: table must be specified in RDB_node */mytab = rel->tab[?]such that mytab.name = table of RDB_node if no mytabfound return error; if ( mytab->top_element ==NULL ) { parent_node =parent element_node of this dom_node; grandparent_node =parent of theparent node; while ( grandparent_node exists && grandparent_node isintermediate && grandparent_node has multiple occurrence &&grandparent_node does not have children mapped to other tables ) {parent node =grandparent node; grandparent_node = parent of thegrandparent_node: } mytab->top_element = name of the parent_node; } }

[0845] Process Root Element

[0846] This is the second phase which will traverse the DOM tree togenerate XML document.

[0847] process_Root_element(SQLHDBC hdbc, process_Root_element(SQLHDBChdbc, DXX_REL *rel, DOMnode root_element_node, char *result_tabname.char *heading_buf, int m, int *n) { working variables: char[]element_stmt; /* temp stmt for element */ DxxBUf outbuf; char[]element_name; char{} top_sql_stmt; SQLDA *selectPtr; DXX_TAB*current_tab; /* A node is trivial if it is intermediate withmulti_occur == NO. */ /* Being trivial is the only case when current_tabis not needed. */ if(root_element is trivial){ prepare outbuf, appendheading_buf to outbuf;process_element_node(hdbc,rel,NULL,root_element_node.outbuf); insertoutbuff into result_table; release outbuf; } else { Get element_name;Find current_tab such that rel->table[i].top_element == element_name if(current_tab is not found)dxxFindTabForIntermediate(root_element_node.rel.&current_(—) tab);current_tab->level = 0; rel->current_level = 0; Callgenerate_SQL(rel,current_tab,top_sql_stmt); execute top_sql_stmt byhstmt, get data into selectPtr; for (i =0; i<m; i++){ fetch on hstmt;current_tab->sqldaPtr = selectDaPtr; current_tab->hstmt = hstmt; prepareoutbuf; append heading_buf to outbuf;process_element_node(hdbc.rel.current_tab.root_element_(—) node.outbuf);insert outbuff into result_table; release outbuf; } /* end of while loop*/ } /* endif intermediate element with multi_occurrence == NO */ Freehstmt, all allocated space, etc. }

[0848] Find the Table for an Intermediate Element with multi_occurrenceYES dxxFindTabforlntermediate(DOM_Node elementNode,   /* input */DXX_REL    *rel,     /* input */; DXX_TAB    **current_tab) /* output */{ Find the first current_tab by traversing down elementNodebreadth-first such that rel->table[i].top_element ==sub_element_name if(current_tab not found) Error: no way to generate the multipleoccurrences because there is no table to fetch from. }

[0849] Generate SQL Statement

[0850] Generate SQL statement using the rel sturcture during the firstphase. Two keys there:

[0851] 1. How to add additional columns to be selected in order to passtheir values down to low level SQL statements;

[0852] 2. How to add additional predicates to the where clause bysetting the join value. generate_SQL_stmt(rel,    /* relational info */dxx_tab,  /* input */ sql_stmt) /* output */ { working variables:char[]select_stmt; char]]fromwhere_stmt; int   indexInSQLDA;char[]jValue; init sql_stmt; /* generate select clause */sprintf(select_stmt.“select ”); for ( i=0; i< dxx_tab->num_col; i++ ) {sprintf(select_stmt, “ %s,”,dxx_tab->col2xml[i].col);dxx_tab->col2xml[i].indexInSQLDA = i; } /* identify column values needto be passed down */ indexInSQLDA = dxx_tab->num_col; for (i = 0; i <dxx_tab->num_join; i+ + ){ check whether dxx_tab->join_info[i].col is inthe dxx_tab->col2xml[] array; if not { then /* code may need to selectit */ for ( j=0; j< dxx_tab->join_info[i].num_join; j+ + ) check whetherthe dxx_tab->join_info[i].foreign[j].tab'level is less thancurrent_tab->level; if yes: strcat(select_stmt,dxx_tab->join_info[i].col); dxx_tab->join_inf[i]->indexInSQLDA =indexInSQLDA; indexInSQLSA+ +; break; /* out from inner loop */ } } }/*end of for */ /* generate from and where clause */sprintf(fromwhere_stmt,“ from %s where %s\n”, dxx_table->namedxx_table->condition); /* identify values got from parent query to beput in the condition This is the join condition */ for (i = 0; i <dxx_tab->num_join: i+ +) { for (j = 0; j< dxx_tab>join_info[i].num_join:j+ + ) { check whether the dxx_tab->join_info[i].foreign[j].tab's levelis greater than current_tab->level; if yes then {getJvalue(rel,& dxx_tab->join_info[i].foreign[j], dxx_tab->name,dxx_tab->join_info[i].col.& jValue); } add “ANDdxx_tab->join_info[i].col = jValue” to fromwhere_stmt; } }strcat(sql_stmt, select_stint); strcat(sql_stmt, fromwhere_stmt);strcpy(dxx_tab->sql_stmt, sql_stmt); }

[0853] Example: 1. select order_key, customer_name, customer_name fromorder_tab 2. select part_key, price, qty, tax from part_tab whereprice > 2500.00 AND order_key = jValue 3. select date, mode, commentsfrom ship_tab where date > 1996-0601 AND part_key = jValue Get JoinValue Get the value from higher level query to from thegetJvalue(DXX_REL   *rel, DXX_TAB_COL *join_col, char  *current_tabname,char  *current_colname, char  **jValue){ working variables:DXX_TAB  *foregn_tab; DXX_TAB_COL *foreign_col; int   indexInSQLDA; 1.find foreign_tab = rel-> table[?]such that its name = join_col->tab; 2.find foreign_col=foreign_tab->col2xml[?].col. such thatforeign_tab-col2xm[?]=join_col.col; 3. if foreign_col is not found thenfind foreign_col=foreign_tab>join_info[i].foreign[j].col such thatforeign_tab->join_info[i].foreign[j].col=current_colname ANDforeign_tab->join info[i].foreign[j].tab=current_tabname 4. indexInSQLDA= foreign_col.indexInSQLDA; 5. get the jValue fromparent_tab->selectDaPtr using indexInSQLDA; 6. return jValue }Example: 1. For the 2nd Sql_stmt: 0.join_col->col = “order_key”  join_col->tab = “order_tab” 1. parent_tab->name = “order_tab” 2.parent_col = parent_tab->join_info [0].foreign[0]   parent_col->tab =“order tab”   parent_col->col = “order_key” 3. parent_col->indexInSQLDA= 0; 4. jValue can be “1”.

[0854] 6. return jValue

[0855] }

[0856] Example:

[0857] 1. For the 2nd Sql_stmt:

[0858] 0. join_col->col=“order_key”

[0859] join_col->tab=“order_tab”

[0860] 1. parent_tab->name=“order_tab”

[0861] 2. parent_col=parent_tab->join_info[0].foreign[0]

[0862] parent_col->tab=“order_tab”

[0863] parent_col->col=“order_key”

[0864] 3. parent_col->indexInSQLDA=0;

[0865] 4. jValue can be “1”.

[0866] Process Element Node

[0867] Process element node and generate the element stmt

[0868] process_element_node(hdbc, process element_node(hdbc, rel,current_tab, element_node, outbuf) { working variable: SQLDAselectDaPtr; int   isTableNew = FALSE; int   hasNotSeenNonAttr = TRUE;int   isLastChildElement = FALSE; int   isChildNewTopElement = FALSE;int   isChildAttrNode = FALSE; int   isChildInCurrentTab = FALSE; int  isChildIntermediate = FALSE; int   isChildTrivial = FALSE;rel->current_level+ +; get element_name for element_node; append “<” andelement_name to outbuf; call DOMgetChildNodes to get a list of allchildren of this element_node. for each child_node, do { /* Set theboolean variables for determining if this child should be generated froma table different from the current one. */ Set isChildIntermediate toTRUE if child_node does not have any text or attribute child. /* A childis trivial iff it is intermediate with multi_occur ==NO. */ /* Beingtrivial is the only case when current_tab is not needed. */ SetisChildTrivial to TRUE if the child_node is trivial. if (current_tabNULL) { set isChildlnCurrentTab to TRUE, if the child_node's name equalsto current tab's top element or in its col2xml[]; } else { if(isChildTrivial) isChildlnCurrentTab = TRUE: else isChildlnCurrentTab =FALSE; } set isChildAttrNode to TRUE if the child_node is theattribute_node: if( !isChiidInCurrentTab ) { looping rel->tab[]to seewhether the child_node is the top_element of some rel->tab[?]; if foundsuch rel-tab[?]and rel->tab[?]->sqldaPtr=NULL set isChildNewTopELementto TRUE; current_tab = rel->tab[?]; } } /* Check if this child should begenerated from a new table. */ /* - An element node does not need a newtable unless (it is a new top element || (current_tab ==NULL && child isnot trivial)) */ /* - An attribute node needs a new table only if itsvalue can not be retrieved from the current table. */ /* - A text nodenever needs a new table. */ if (isChildNewTopElement || (current_tab==NULL && !isChildTrivial) || ((isChildAttrNode) &&(!isChildInCurrentTab))) /* Handle new table. */ { isTableNew DXX_TRUE;/* Find the right table. */ if ( childNewTopElement) current_tab =rel->table[1]such that current_tab->col2xml[?].xml child node's name if(current_tab is not found && isChildIntermediate)dxxFindTabForlntermediate(child_node, rel, & current tab); } if(current_tab->sql_stmt[0] == ‘\0’) { /* Generate, prepare, and executean SQL query. */ currenthd —tab->level rel->current_level callgenerate_SQL(current_tab,sql_stmt); prepare and execute the sql_stmtwith hstmt and selectDaPtr; } current_tab->sqldaPtr = selectDaPtr;current_tab->hsmt = hsmt; fetch hstmt; }/* End of handling the new table*/ if the child_node is an attribute_node, do { call processatt_text_node(rel,& child_node.current_tab,DXX_ATTRIBUTE.outbuf); if notlast one append outbuf with “,” else append outbuf with “>”; } if thechild_node is a text_node, callprocess_att_text_node(rel.&child_node,current_tab,DXX_TEXT,outbuf); ifthe child_node is an element_node { if (isTableNew } { /* Loop */ /*Generate possible multiple occurrences of child_node. */ /* The numberof occurrences equals the number of rows to be fetched. */ while ( fetchok ) { process_element_node(hdbc,rel,current_tab.&child_node,outbuf);fetch on current_tab->hstmt; /* NOTE: current_tab->selectDaPtr has newstuff */ }/* end of while loop */ clean/free current_tab->sqldaPtr andcurrent_tab->hstmt: } else { /* same old table */ /* Generate a singleoccurrence of child_node. call processelement_node(hdbc.rel.current_tab,&child_node.outbuf): } } /* End ofchild_node is element */ } /* End of loop “for each child” */rel->current_level--; append outbuf with (“</%s>\n”.element_name); }

[0869] working, variable:

[0870] Process Attribute/Text Node

[0871] A process attribute or text node which has no other child nodeother RDB_node, generate the XML content for attribute and text.

[0872] process_att_text_node(DXX_REL *rel; process_att_text_node(DXX_REL*rel; DXX_TAB *current_tab; DOMElement*node: int type; DXXOutBuf*outbuf){ working variables: char[] attname; char[] eleName; char[] xmlname;char[] value; char[] tabname; 1. if (type = DXX ATTRIBUTE) { callDOMgetAttribute to get the attname. xmlname = attname; } else {/*textnode */ find the name of its parent element_node, call it eleName;xmlname = eleName; } 2. call DOMgetChild to get the RDB_node tabname =RDB_nodes' table name colname = RDB_nodes' column name 4. if ( tabname =current_tab->name) call getValue(current_tab ,xmlname.&value); else {loop rel to find rel->tab[?] such rel->tab[?]->name=tabname; if found,call getValue(rel->tab[?],xmlname,&value); else{ internal error; } } 5.if( type == DXX_ATTRIBUTE) { append outbuf with (“%s=%s”,attname,value);else append outbuf with (“%s”, value); }

[0873] Get Value Get the value as the column value from relationaltable. getValue(DXX_TAB *current_tab, char *nodename, char *attr_value){ working variable: int indexINSQLDA; for ( i = 0; i <current_tab->num_col; i++) if( current_tab->col2xml[i].xml = nodename) {indexInSQLDA=current_col2xml[i].indexInSQLDA; get attr_value fromcurrent_tab->selectDaPtr using indexInSQLDA; break; } } }

G. A TECHNIQUE TO STORE FRAGMENTED XML DATA INTO A RELATIONAL DATABASEBY DECOMPOSING XML DOCUMENTS WITH APPLICATION SPECIFIC MAPPINGS

[0874] This invention presents a technique which stores fragmented XMLdata into relational database tables by decomposing XML documents withapplication specific mappings. The mapping is based on the XML PathLanguage (Xpath) data model. Using this invention, a user can shred XMLdocuments into new or existing database tables. This makes a relationaldatabase a repository of fragmented XML data.

[0875] The technique parses an incoming XML document to be decomposedand parses an XML formatted Data Access Definition (DAD) withapplication specific mapping based on the XPath data model, generatingtwo Document Object Model (DOM) trees. The DAD identifies relationaltables and columns. One DOM tree is an XML document DOM tree and theother is a DAD DOM tree. The technique then works on both DOM trees tomap data in the incoming XML document DOM tree to columns in relationaltables, according to the DAD DOM tree.

[0876] Additionally, the technique identifies different relationallevels and XML levels. Next, the technique generates SQL insertionstatements based on the relational level, while taking data from a listof multiple occurrence XML element trees in the same XML level.Additionally, optimization and recursion techniques are used.

[0877]FIG. 11 is a flow diagram illustrating the steps performed by theXML System to decompose XML documents with application specificmappings. In block 1100, the XML System receives an XML documentcontaining XML data. In block 1102, the XML System parses the XMLdocument to generate an XML Document Object Model (DOM) tree. In block1104, the XML System receives a data access definition (DAD) thatidentifies one or more relational tables and columns. In block 1106, theXML System processes the DAD to generate a DAD Document Object Model(DOM) tree. In block 1108, the XML System maps data from the XML DOMtree to columns in relational tables according to the DAD DOM tree.

[0878] G.1 Decomposing an XML Document into an XML Collection

[0879] Decomposition refers to breaking down the data inside of an XMLdocument and storing it into one or more relational tables. The datastored is basically un-tagged. The XML System provides stored proceduresto decompose XML data from an XML document. A user always needs todefine a DAD for the decomposition. The user may enable an XMLcollection with a DAD first, then use the stored procedures. Forexample, when decomposing XML documents into new tables, an XMLcollection must be enabled so that all tables in the XML collection canbe created by the XML System. For some reason, a user may want to useexisting tables to add additional data from incoming XML documents. Inthis case, the user needs to alter the tables to make sure the columnsspecified in the DAD exist in the tables. The enable collectionoperation will check this. If the user does not enable the XMLcollection, the user must pass the DAD to the stored procedure. Thesequence order of element or attribute of multiple occurrence will bereserved, only for tables created by the XML System.

[0880] G.1.1 Specifying an Xcollection

[0881] In the DAD, a user still needs to specify the Xcollection.

[0882] G.1.1.l Mapping Scheme in XML Collections

[0883] The mapping between composed/decomposed XML documents and an XMLcollection is specified in the Xcollection of a DAD. The XML Systemadapts the notation used in XSLT and uses a subset of it to define theXML document structure. In order to facilitate the mapping, the XMLSystem introduces the element Relational DataBase node (RDB_node) to theXcollection.

[0884] The DAD defines the XML document tree structure using seven kindsof nodes defined by XSLT/XPath:

[0885] root_node

[0886] element_node

[0887] text_node

[0888] attribute_node

[0889] namespace_node

[0890] processing_instruction_node

[0891] comment_node

[0892] For simple and complex compositions, the RDB_Node is used todefine where the content of an XML element or value of an XML attributeis to be stored or retrieved.

[0893] The RDB_Node has the following components:

[0894] Table: the name of the relational table or updateable view, inwhich the XML element content or attribute value is to be stored.

[0895] Column: the name of the column which contains the element contentor attribute value.

[0896] Condition: the predicates in the WHERE clause to select thedesired column data.

[0897] In the definition of an Xcollection, for this embodiment of theinvention, the following approach is used to define the mapping:

[0898] RDB_node Mapping: Specify RDB node for element_node, text_nodeand attribute_node in the XML data model. In this case, the XML Systemwill generate SQL statements based on the RDB_nodes and document treestructure.

[0899] The text_node and attribute_node will have a one-to-one mappingto/from a column in a relational table. Therefore, each of them willhave a RDB_node to define the mapping, where the RDB_node is needed forthe RDB_node mapping. It is possible that an element_node has notext_node but only child element_node(s).

[0900] Using a RDB_node to specify an element_node, text_node andattribute_node is more general. For an element_node, only the rootelement_node needs to have a RDB_node. In this RDB_node, the user isrequired to specify all tables used to compose/decompose data. as wellas a join condition among these tables. The condition predicate in thisRDB_node will be pushed down from the root element_node to all childnodes. Ideally, the way to tie all tables together within an XMLcollection is the primary-foreign key relationship. However. it oftenhappens that some existing user tables do not have such a relationship.Therefore, requiring the foreign key relationship for composition is toorestrictive. However, in the case of decomposition, if new tables arecreated for storing the decomposed XML data, then the DAD requires auser to specify the primary key of each table, as well as theprimary-foreign key relationship among tables.

[0901] G.1.1.2 Sample DADs for XML Collections

[0902] Assuming the XML documents need to be composed or decomposed arelike the one shown in the example above, the following sample DAD showshow to define the mapping from relational tables using RDB_node Mapping.In particular, the following example DAD shows how to compose/decomposea set of XML documents from/into three relational tables while using theRDB_node to specify the mapping. Litem_DAD3.dad <?xml version=“1.0”?><!DOCTYPE Order SYSTEM “E:\dtd\dad.dtd”> <DAD><dtdid>E:\dtd\lineItem.dtd</dtdid> <validation>YES</validation><Xcollection> <prolog>?xml version=“1.0”?</prolog> <doctype>!DOCTYPEOrder SYSTEM “E:\dtd\lineItem.dtd”</doctype> <root_node> <element_nodename=“Order”> <RDB_node> <table name=“order_tab” key=“order_key”/><table name=“part_tab” key=“part_key”/> <table name=“ship_tab”key=“date”/> <condition> order_tab.order_key = part_tab.order_key ANDpart_tab.part_key = ship_tab.part_key </condition> </RDB_node><attribute_node name=“Key”> <RDB_node> <table name=“order_tab”/> <columnname=“order_key” type=“integer”/> </RDB_node> </attribute_nodename=“Customer”> <text_node> <RDB_node> <table name=“order_tab”/><column name=“customer” type=“char(128)”/> </RDB_node> </text_node></element_node> <element_node name=“Part”> <attribute_node name=“Key”><RDB_node> <table name=“part_tab”/> <column name=“part_key”type=“integer”/> </RDB_node> </attribute_node> <element_nodename=“Quantity”> <text_node> <RDB_node> <table name=“part_tab”/> <columnname=“qty” type=“integer”/> </RDB_node> </text_node> </element_node><element_node name=“ExtendedPrice”> <text_node> <RDB_node> <tablename=“part_tab”/> <column name=“price” type=“real”/> <condition> price&amp gt; 2500.00 </condition> </RDB_node> </text_node> </element_node><element_node name=“Tax”> <text_node> <RDB_node> <tablename=“part_tab”/> <column name=“tax” type=“real”/> </RDB_node></text_node> </element_node> <element_node name=“shipment”><element_node name=“ShipDate”> <text_node> <RDB_node> <tablename=“ship_tab”/> <column name=“date” type=“date”/> <condition> date&amp gt; “1996-01-01” </condition> </RDB_node> </text_node></element_node> <element_node name=“ShipMode”> <text_node> <RDB_node><table name=“ship_tab”/> <column name=“mode” type=char(120)”/></RDB_node> </text_node> </element_node> <element_node name=“Comment”><text_node> <RDB_node> <table name=“ship_tab”/> <column name=“comment”type=“varchar(2k)”/> </RDB_node> </text_node> </element_node></element_node> <! -- end of element Shipment> </element_node> <! -- endof element Part ---> </element_node> <! -- end of element Order ---></root_node> </Xcollection> </DAD>

[0903] G.1.2 Defining Xcollection for Decomposition in DAD

[0904] One DAD can used for both composition and decomposition. Fordecomposition, additional information is required to be specified in theDAD, however this information is just ignored when the DAD is used forcomposition.

[0905] The additional information needed for decomposition is describedbelow:

[0906] Primary Key for each table in the RDB_node for the root elementnode

[0907] In order to tie XML collection tables together, theprimary-foreign key relationship is preserved among the tables. Theprimary key can consists of a single column or as a composite oneconsisting of multiple columns. The primary key must be specified foreach table. A user does so by adding the attribute key to the tableelement of the RDB_node. In the following example, the RDB_node of theroot element_node “Order” is defined as: <element_node name=”Order”><RDB_node> <table name=“order_tab” key=“order_key”> <tablename=“part_tab” key=“part_key, price”/> <table name=“ship_tab”key=“date”/> <condition> order_tab.order_key = part_tab.order_key ANDpart_tab.part_key = ship_tab.part_key </condition> </RDB_node>

[0908] with the keys specified. In the above example, the primary key ofpart_tab is a composite one.

[0909] Data Type of the column

[0910] In order to create the right data type for each column whencreating new tables during the enabling XML collection process, thecolumn type must be specified for the RDB_node for each attribute_nodeor text_node. This is because the real data is mapped from an XMLattribute value or element text to relational columns. A user does so byadding the attribute tape to the column element. In the followingexample, the RDB_node of attribute_node “Key” of the element_node“Order” is defined as: <attribute_node name=“Key”> <RDB_node> <tablename=“order_tab”/> <column name=“order_key” type=“integer”/> </RDB_node></attribute_node> where the type is defined as integer.

[0911] where the type is defined as integer.

[0912] G.1.3 Enabling an XML Collection

[0913] The purpose of enabling an XML Collection for decomposition is toparse a DAD, create new tables or check the mapping against existingtables. The DAD is stored into the XML_USAGE table when the XMLCollection is enabled.

[0914] When a user prefers to have the XML System create collectiontables, the user should enable the XML collection. Additionally, theenablement depends on the stored procedure the user chooses to use. Thestored procedure dxxInsertXML( ) will take XML Collection name as theinput parameter. In order to use the stored procedure dxxInsertXML( ),the user must enable an XML collection before calling it. The user cancall stored procedure dxxShredXML( ) without the enabling of an XMLcollection by passing a DAD. In the later case, all tables specifiedmust exist in the database.

[0915] G.1.3.1 Enabling XML Collection Option

[0916] For composition, an XML collection is not required to be enabled.The assumption is that all collection tables already exist in thedatabase. The stored procedure can take a DAD as an input parameter andgenerate XML documents based on the DAD. On the other hand. thecomposition is the opposite of the decomposition. For XML collectionsenabled during the decomposition process, the DAD is likely to be usedto compose XML documents again. If the same DAD is used, then thecollection can be enabled for both composition and decomposition.

[0917] An XML Collection can be enabled through the XML Systemadministration GUI (graphical user interface) or using the dxxadmcommand with the Enable_collection option. The syntax of the option on aDB2 server is as follows:

[0918] dxxadm enable_collection db_name collection_name DAD_file [-ttablespace]

[0919] where

[0920] a. db_name: name of the database.

[0921] b. collection_name: name of the XML collection, which will beused as the parameter to the stored procedures.

[0922] c. DAD_file: Data Access Definition(DAD) file.

[0923] d. tablespace: Optional. The tablespace contains tables in theXML collection. If new tables need to be created for decomposition, thennew tables will be created in the specified tablespace.

[0924] The following is an example of enabling the XML collection calledsales_order in database mydb with the DAD_file Litem_DAD3.dad.

[0925] /home/u1>dxxadm enable_collection mydb sales_order Litem_DAD3.dad

[0926] DXXA009I XML System is enabling collection sales_order. Pleasewait.

[0927] DXXA010I XML System has successfully enabled the collectionsales_order.

[0928] /home/u1>

[0929] The enable_collection option mainly does the following things toa database:

[0930] e. Read the DAD file, call XML parser to parse DAD, and saveinternal information for mapping.

[0931] f. Store internal information into the XML_USAGE table.

[0932] The option is good for performance and is usually helpful toperform composition and decomposition using one DAD.

[0933] G.1.3.2 Enable collection Option

[0934] The enable_collection option enables an XML collection associatedwith an application table The association between the application tableand the side table specified by the DAD is through the root id.

[0935] Syntax

[0936] dxxadm enable_collection db_name collection DAD_File

[0937] [-t tablespace] [-l login] [-p password]

[0938] Argument

[0939] g. db_name: the database name.

[0940] h. collection: the name of an XML collection.

[0941] i. DAD_File: the file containing the DAD.

[0942] j. tablespace: Optional. The tablespace containing a user tablespecified in the DAD or side table created by the XML System.

[0943] k. login: Optional. The user ID, which is only needed if thecommand is invoked from a DB2 client.

[0944] l. password: Optional. The password, which is only needed if thecommand is invoked from a DB2 client.

[0945] The enable_collection option will enable an XML collection. Theenablement process is to parse the DAD and prepare tables for XMLcollection access. It takes the database name, a name of the XMLcollection, a DAD_File and an optional tablespace. The XML collectionwill be enabled based on the DAD in the DAD File. It checks whether thetables specified in the DAD exist. If the tables do not exist, the XMLSystem will create the tables according to the specification in the DAD.The column name and data type is taken from the RDB_node of anattribute_node or text node. If the tables exist, the XML System willcheck whether the columns were specified with the right name and datatypes in the corresponding tables. If a mismatch is found, an error willbe returned. The tablespace is optional, but it is specified if thecollection tables are to be created in a tablespace other than thedefault tablespace of the specified database.

[0946] The enable_collection is required for decomposition storedprocedure dxxInsertXML( ), and its pairing dxxRetrieveXML( ), and thedxxUpdateXML( ). For stored procedure dxxGenXML( ) and dxxShredXML( )which take a DAD as input, the enablement of an XML collection is notrequired. For the latter stored procedures, it is assumed that alltables specified in the DAD exist in the database already. If they don'texist, an error will be returned. The enable_collection does have apairing disable collection option. But the operation ofdisable_collection is much simpler. It just deletes the collection fromXML_USAGE table.

[0947] As discussed above in Section E, FIG. 7 is a diagram illustratingcode organization to compose XML documents. The dxxGenXML( ) storedprocedure 700 and the dxxRetrieveXML( ) stored procedure 702 each invokethe dxxComposeXML( ) module 704. Decision block 706 determines whetherRDB_node mapping is to be performed. If RDB_node mapping is not to beperformed, processing continues with SQL to XML module 708 and the SQLto XML mapping is performed by the SQL to XML mapping, technique 710. IfRDB_node mapping is to be performed, processing continues with thedxxGenXMLFrom RDB( ) module 712 and RDB_node mapping is performed by theRDBR technique.

[0948] G.1.4 Using Stored Procedures for Decomposition

[0949] The decomposition of XML documents from an XML collection isperformed through the use of stored procedures. The XML System providesthe following stored procedures to compose documents.

[0950] dxxInsertXML( )

[0951] The dxxInsertXML( ) takes two input parameters, the name of anenabled XML collection and the XML document to be decomposed. It returnstwo output parameters: an error code and an error message.

[0952] The stored procedure dxxInsertXML( ) inserts an input XMLdocument into an enabled XML collection which is associated with a DAD.The collection tables and the mapping are specified in the DAD. Duringthe enabling process, the XML System has already checked or created thecollection tables according to the specification of Xcollection. Thestored procedure dxxInsertXML( ) will decompose the input XML documentaccording to the mapping specified in the DAD and insert un-tagged XMLdata into the tables of the named XML collection.

[0953] Stored Procedure Declaration:

[0954] dxxInsertXML(char(UDB_SIZE) collectionName, /* input */

[0955] CLOB AS LOCATOR xmlobj, /* input */

[0956] long returnCode, /* output */

[0957] varchar(1024) returnMsg) /* output */

[0958] Parameters:

[0959] collectionName: IN, the name of an enabled XML collection.

[0960] xmlobj: IN, an XML document object in CLOB LOCATOR type.

[0961] returnCode: OUT, return code in case of error,

[0962] returnMsg: OUT, message text in case of error.

[0963] The following is an example of this stored procedure call.

[0964] EXEC SQL INCLUDE SQLCA;

[0965] EXEC SQL BEGIN DECLARE SECTION;

[0966] char collection[64]; /* name of an XML collection */

[0967] SQL TYPE is CLOB_FILE xmlDoc; /* input XML document */

[0968] long returnCode; /* error code */

[0969] char returnMsg[1024]; /* error message text */

[0970] short collection_ind;

[0971] short xmlDoc_ind;

[0972] short returnCode_ind;

[0973] short returnMsg_ind;

[0974] EXEC SQL END DECLARE SECTION;

[0975] /* initialize host variable and indicators */

[0976] strcpy(collection,“sales_order”)

[0977] strcpy(xmlobj.name,“e:\xml\order1.xml”);

[0978] xmlobj.name_lenth=strlen(“e:\xml\order1.xml”);

[0979] xmlobj.file_option=SQL_FILE_READ;

[0980] returnCode=0;

[0981] returnMsg[0]=‘0’;

[0982] collection_ind=0;

[0983] xmlobj_ind=0;

[0984] returnCode_ind=−1;

[0985] returnMsg_ind=−1;

[0986] /* Call the stored procedure */

[0987] EXEC SQL CALL db2xml!dxxInsertXML(:collection:collection_ind;

[0988] :xmlobj:xmlobj_ind,

[0989] :returnCode:returnCode_ind.:returnMsg:returnMsg_ind);

[0990] If the XML collection sales_order is enabled with Litem_DAD3.dad,then the dxxInsertXML( ) call will decompose the input XML document“e:\xml\order1.xml” and insert data into the sales_older collectiontables according to the mapping specified in Litem_DAD3.dad.

[0991] dxxShredXML( )

[0992] The stored procedure dxxShredXML( ) work, as same as the storedprocedure dxxInsertXML( ) except that it takes a DAD as the first inputparameter instead of a name of an enabled XML collection. Therefore, itcan be called without enabling an XML collection.

[0993] The stored procedure dxxShredXML( ) inserts an input XML documentinto an XML collection according to the Xcollection specification in theinput DAD. If the tables used in the Xcollection of the DAD do not existor the columns do not meet the data types specified in the DAD mapping,an error will be returned. The stored procedure dxxShredXML( )decomposes the input XML document and inserts un-tagged XML data intothe tables specified in the DAD.

[0994] Stored Procedure Declaration

[0995] dxxShredXML(CLOB as LOCATOR dad, /* input */

[0996] CLOB as LOCATOR xmlobj, /* input */

[0997] long *returnCode, /* output */

[0998] varchar(1024) *returnMsg) /* output */

[0999] Parameters:

[1000] dad: In, a DAD in CLOB Locator type,

[1001] xmlobj: IN, an XML document object in CLOB LOCATOR type.

[1002] returnCode: OUT, return code in case of error,

[1003] returnMsg: OUT, message text in case of error.

[1004] The following is an example of the dxxShredXML( ) call. EXEC SQLINCLUDE SQLCA; EXEC SQL BEGIN DECLARE SECTION; SQL TYPE is CLOB_LOCATORdad; /* DAD*/ SQL TYPE is CLOB_FILE xmlDoc; /* input XML document */long returnCode; /* error code */ char returnMsg[1024]; /* error messagetext */ short dad_ind; short xmlDoc_ind; short returnCode_ind; shortreturnMsg_ind; EXEC SQL END DECLARE SECTION; /* initialize host variableand indicators * /* make dad as the CLOB_LOCATOR of dad *strcpy(xmlobj.name,“e:\xml\order1.xml”);xmlobj.name_length=strlen(“e:\xml\order1.xml”);xmlobj.file_option=SQL_FILE_READ; returnCode = 0; returnMsg[0] = ‘\0’;dad_ind = 0; xmlobj_ind = 0; returnCode_ind = −1; returnMsg_ind = −1; /*Call the stored procedure */ EXEC SQL CALLdb2xml!dxxShredXML(:dad:dad_ind; :xmlobj:xmlobj_ind,:returnCode:returnCode_ind,:returnMsg:returnMsg_ind);

[1005] If the content of DAD_buf has the Litem_DAD3.dad content, thenthe dxxShredXML( ) call will decompose the input XML document“e:\xml\order1.xml” and insert data into the sales_order collectiontables.

[1006] G.2 Example Decomposition

[1007] The following illustrates an example of decomposing an XMLdocument into a relational database using a RDB_node in the DAD. XMLDocument to be input: <?xml version=“1.0”?> <!DOCTYPE Order SYSTEM“E:\dxx\test\dtd\litem.dtd”> <Order Key=“1”> <Customer> <Name>GeneralMotor</Name> <Email>parts@gm.com</Email> </Customer> <Part Color=“red”><Key>68</Key> <Quantity>36</Quantity><ExtendedPrice>34850.16</ExtendedPrice> <Tax>0.06</Tax> <Shipment><ShipDate>1998-08-19</ShipDate> <ShipMode>BOAT</ShipMode> </Shipment><Shipment> <ShipDate>1998-08-20</ShipDate> <ShipMode>AIR</ShipMode></Shipment> </Part> <Part Color=“red”> <key>128</key><Quantity>28</Quantity> <ExtendedPrice>38000.00</ExtendedPrice><Tax>0.07</Tax> <Shipment> <!-- This shipment will not be inserted. --><ShipDate>1961-01-01</ShipDate> <ShipMode>BOAT</ShipMode> </Shipment><Shipment> ShipDate>1998-12-30</ShipDate> <ShipMode>TRUCK</ShipMode></Shipment> /Part> </Order> Document Access Definition (DAD): <?xmlversion=“1.0”?> <!DOCTYPE Order SYSTEM “E:\dtd\dxxdad.dtd”> <DAD><dtdid>E:\dtd\lineItem.dtd</dtdid> <validation>YES</validation><Xcollection> <root_node> <element_node name=“Order”> <RDB_node> tablename=“order_tab” key=“order_key”/> <table name=“part_tab”key=“part_key”/> <table name=”ship_tab” key=“ship_date”/> <condition>order_tab.order_key=part_tab.o_key AND part_tab.part_key=ship_tab.p_key</condition> </RDB_node> <attribute_node name=“Key”> <RDB_node> <tablename=“order_tab”/> <column name=“order_key” type=“integer”/> </RDB_node></attribute_node> <element_node name=“Customer”> <element_nodename=“Name”> <text_node> <RDB_node> table name=“order_tab”/> <columnname=“customer_name” type=“char(128)”/> </RDB_node> </text_node></element_node> <element_node name=“Email”> <text_node> <RDB_node><table name=“order_tab”/> <column name=“customer_email”type=“char(128)”> </RDB_node> </text_node> </element_node></element_node> <element_node name=“Part”> <attribute_node name=“Color”><RDB_node> <table name=“part_tab”/> <column name=“color”/> </RDB_node></attribute_node> <element_node name=“Key”> <text_node> <RDB_node><table name=“part_tab”/> <column name=“part_key” type=“integer”/></RDB_node> </text_node> </element_node> <element_nodename=“ExtendedPrice”> <text_node> <RDB_node> <table name=“part_tab”/><column name=“price” type=“real”/> <condition> price &amp gt; 2500.00</condition> </RDB_node> </text_node> </element_node> <element_nodename=“Tax”> <text_node> <RDB_node> <table name=“part_tab”/> <columnname=“tax” type=“real”/> </RDB_node> </text_node> </element_node><element_node name=“Quantity”> <text_node> <RDB_node> <tablename=“part_tab”/> <column name=“qty” type=“integer”/> </RDB_node></text_node> </element_node> <element_node name=“shipment”><element_node name=“ShipDate”> <text_node> <RDB_node> <tablename=“ship_tab”/> <column name=“date” type=“date”/> <condition> date&amp gt; ‘1966-01-01’ </condition> </RDB_node> </text_node></element_node> <element_node name=“ShipMode”> <text_node> <RDB_node><table name=“ship_tab”/> <column name=“mode” type=“char(128)”/></RDB_node> </text_node> </element_node> <element_node name=“Comments”><text_node> <RDB_node> <table name=“ship_tab”/> <column name=“comments”type=“varchar(2000)”/> <RDB_node> </text_node> </element_node></element_node> <!-- end of element Shipment --> </element_node> <!--end of element Part --> </element_node> <!-- end of element Order --></root_node> </Xcollection> </DAD>

[1008] G.3 Detailed Techniques

[1009] The following discussion focuses on the technique for oneembodiment of the invention. In particular, the following includespseudocode and data structures used by the technique.

[1010] G.3.1 Data Structures

[1011] The following data structures are used for decompostion of XMLdocuments using RDB-nodes. /* ------------------------------------*Column to XML node pair *-----------------------------------*/ typedefstruct col2xml { char col[DB2_TAB_COL_VIEW_LEN]; chardataType[DB2_TAB_COL_VIEW_LEN+8]; /* data type of the column */ charxml[DXX_XML_FIELD_SIZE]; /* xml can be attribute or element name */ intxmlType; /* DXX_ATTRIBUTE or DXX_TEXT */ int xmlLevel; /* fordetermining if a column needs to be copies to the next row */ intindexInSQLDA; } DXX_COL2XML; /*------------------------------------ *Table and Column pair *-----------------------------------*/ typedefstruct tabcol { char tab[DB2_TAB_COL_VIEW_LEN]; charcol[DB2_TAB_COL_VIEW_LEN]; int indexInSQLDA; } DXX_TAB_COL;/*------------------------------------ * Join information*-----------------------------------*/ typedef struct join_info { charcol[DB2_TAB_COL_VIEW_LEN]; /* column used in the join condition */ intnum_join; /* number of foreign columns to be joined */ DXX_TAB_COLforeign[DXX_NUM_FOREIGN]; /* foreign column and its table to be joined*/ } DXX_JOIN_INFO; /*------------------------------------ * Primary keyInformation: *-----------------------------------*/ typedef structpri_key { int num_col; /* number of columns in the key */ charname[DXX_NUM_MAPPING][DB2_TAB_COL_VIEW_LEN]; /* the column names of theprimary key */ } DXX_PRIKEY; /*------------------------------------ *Foreign key Information: *-----------------------------------*/ typedefstruct for_key { int num_col; /* number of columns in the key */DXX_COL2XML col[DXX_NUM_MAPPING]; /* the columns of the foreign key */ }DXX_FORKEY; /*------------------------------------ * Table Information:------------------------------------*/ typedef struct tab { charname[DB2_TAB_COL_VIEW_LEN]; /* name of the table */ DXX_PRIKEY pri_key;/* primary key */ DXX_FORKEY for_key; /* foreign key */ int level; /*relational level of the table */ char top_element[DXX_XML_FIELD_SIZE];/* the highest level of XML element using * column data of this table */char sql_stmt[MAX_STMT_LEN]; /* SQL statement */ SQLHSTMT hstmt; /* CLIstatement handle */ struct sqlda *sqldaPtr; /* pointer of selected data*/ int num_col; /* number of columns which data used in the * XMLdocument */ DXX_COL2XML col2xml[DXX_NUM_MAPPING]; /* mapping betweencolumns in this table to * XML attribute or element text */ intnum_join; /* number of columns in this table which * will form joinconditions */ DXX_JOIN_INFO join_info[DXX_NUM_JOIN_IN_TABLE]; /* joininformation * char condition[DXX_CONDITION_IN_REL]; /* condition ofselect for columns in this * table */ } DXX_TAB;/*------------------------------------------------ * RelationInformation for entire XML documents*-----------------------------------------------*/ typedef struct rel {int current_level; /* the current XML level during treversal */ intnum_tables; /* number tables to generate/decompose XML doc*/ DXX_TAB*tab[DXX_NUMTAB_IN_REL]; /* details of each table */ char**top_elements; /* index to tab[i]->top_element for fast search. */SQLHDBC hdbc; /* CLI connection handle */ } DXX_REL;/*------------------------------------------------ * Data structure tostore a row *-----------------------------------------------*/ typedefstruct row { int num_col; char **coldata; /* Array of pointers */ }DXX_ROW; /*------------------------------------------------ * Datastructure to store the rows*-----------------------------------------------*/ typedef struct rows {int num_rows; DXX_ROW *row[DXX_MAX_ROWS]; } DXX_ROWS;

[1012] The following routine will enable a column.

[1013] enableColl(dadbuf,tablespace,sqlstate, msgtext)

[1014] dadbuf—a memory buffer containing a document ac

[1015] sqlstate—OUT: the SQL state in case of errors

[1016] msgtext—OUT: message text in case of errors Note that in thefollowing routine, the sqlstate and msgtext are ignored in oneembodiment. DOMElement dad; DOMElement dad_root_element; DXX_RELrelation_info; DXX_REL *rel = &relation_info; /* Parse the DAD andprepare the SQL query */ parse dad_buf to get the DOM tree of DAD;dad_root_element = the first element_node of dad; /* init relationalstructure */ rel->current_level = 0; rel->num_tables = 0; /* setrelational structure */ setRel(rel,dad,DXX_DECOMPOSE); /* processrelational information */process_rel(rel,dad_root_element,0,DXX_DECOMPOSE); /* prepare tables */prepare_tables(rel,tablespace); /* Register collection in XML_usagetable */ insert a new row into XML_usage table, using collection name; }

[1017] G.3.2 Setup Relational Structure

[1018] This routine will process the RDB_node of the top element codeand initialize the REL structure of entire XML documents. After theprocess, all tables involved in composing/decomposing XML documentsshould be included in the REL structure and the relationship betweentables should also be recorded in the REL.

[1019] setRel(rel,dad, dxx_type) { setRel(rel,dad, dxx_type) { workingvariable: DXX_TAB *current_tab; get the RDB_node of thetop_element_node; if no RBD_node return error; for each table inRDB_node do { allocate space for current_tab; current_tab->name =RDB_node's table name; if (dxx_type == DXX_DECOMPOSE)current_tab->primary_key_name = RDB_node's table key;current_tab->top_element = NULL; current_tab->level = 0;current_tab->num_col = 0; current_tab->selectDaPtr = NULL;current_tab->condition = NULL; if ( RDB_node's condition != NULL ) scanthe condition of RDB_node, current_tab->num_join = 0; for eachcurrent_tab.column in the predicate do { i = current_tab->num_join;current_tab->join_info[i].col = current_tab.column;current_tab->join_info[i].num_join = 0; for each other_tab.col =current_tab.column do { j = current_tab->join_info[i].num_join;current_tab->join_info[i].foreign[j].col = col in other_tabcurrent_tab->join_info[i].foreign[j].tab = other_tabcurrent_tab->join_info[i].foreign[j].indexInSQLDA=DXX_NONEcurrent_tab->join_info[i].num_join++; } current_tab->num_join++; }rel->tab[i] = current_tab; rel->num_tables++; } Example:rel->tab[0].name = “order_tab”; rel->tab[0].primary_key_name =“order_key”; rel->tab[0].level = 0; rel->tab[0].top_element = NULL;rel->tab[0].selectDaPtr = NULL; rel->tab[0].num_col = 0;rel->tab[0].num_join = 1; rel->tab[0]->join_info[0]->col = “order_key”;rel->tab[0]->join_info[0]->num_join = 1;rel->tab[0]->join_info[0]->foreign[0].col = “o_key”;rel->tab[0]->join_info[0]->foreign[0].tab = “part_tab”;rel->tab[0]->join_info[0]->foreign[0].indexInSQLDA = DXX_NONE;rel->tab[0]->condition = NULL; rel->tab[1].name = “part_tab”;rel->tab[1].primary_key_name = “part_key”; rel->tab[1].level = 0;rel->tab[1].top_element = NULL; rel->tab[1].selectDaPtr = NULL;rel->tab[1].num_col = 0; rel->tab[1].num_join = 2;rel->tab[1]->join_info[0]->col = “o_key”;rel->tab[1]->join_info[0]->num_join = 1;rel->tab[1]->join_info[0]->foreign[0].col = “order_key”;rel->tab[1]->join_info[0]->foreign[0].tab = “order_tab”;rel->tab[1]->join_info[0]->foreign[0].indexInSQLDA = DXX_NONE;rel->tab[1]->join_info[1]->col = “part_key”;rel->tab[1]->join_info[1]->num_join = 1;rel->tab[1]->join_info[1]->foreign[0].col = “p_key”;rel->tab[1]->join_info[1]->foreign[0].tab = “ship_tab”;rel->tab[1]->join_info[1]->foreign[0].indexInSQDLA = DXX_NONE;rel->tab[1]->condition = NULL; rel->tab[2].name = “ship_tab”;rel->tab[2].primary_key_name = “ship_date”; rel->tab[2].top_element =NULL; rel->tab[2].level = 0; rel->tab[2].selectDaPtr = NULL;rel->tab[2].num_col = 0; rel->tab[2].num_join = 1;rel->tab[2]->join_info[0]->col = “p_key”;rel->tab[2]->join_info[0]->num_join = 1;rel->tab[2]->join_info[0]->forign[0].col = “part_key”;rel->tab[2]->join_info[0]->forign[0].tab = “part_tab”;rel->tab[2]->join_info[0]->foreign[0].indexInSQDLA = DXX_NONE;rel->tab[2]->condition = NULL; rel->num_tables = 3;

[1020] G.3.2.1 Process Relational Information

[1021] This is the second phase of the preparation process to gather allinformation to generate SQL statements. It recursively processes eachRDB_node for each attribute, text, and element node of DAD, and recordsthe mapping relationship into the REL data structure.

[1022] process_rel(rel, dom_node,current_level, dxx_type)process_rel(rel, dom_node,current_level, dxx_type) {  /* The first passfills in everything in rel, except foreign_key. */  process_rel1(rel,dom_node.current_level, dxx_type):  if (dxx_type == DXX_DECOMPOSE) {  /* The second pass fills in the foreign_key of each table in rel andupdate num_col. */   process_rel2(rel);  }  findTopElements(rel,dom_node, dxx_type);  Create a hash table or sorted array of alltop_elements  rel->top_elements for fast search. } process_rel1(rel,dom_node, dxx_type) {  if (dom_node type == element_node) {  rel->current_level++;   child = DOMGetFirstChild;   while ( child !=NULL ) {    process_rel1(rel,child,dxx_type);    child =DOMGetNextchild;   }  }  if (type == attribute_node ∥    type ==text_node ∥    (dom_node is a leaf element node && DXX_DECOMPOSE)) {  call DOMgetChild to get the RDB_node of dom_node;   if ( there is noRDB_node )   call process_RDB_node(rel,dom_node, RDB_node.dxx_type);  } if (dom_node type == element_node)  rel->current_level--; } /* Fill inthe foreign_key of each table in rel. */ process_rel2(rel) {  /* Workingvariables: */  DXX_TAB *foreignTab;  for (tabIndex = 0; tabIndex <rel->num_tables; tabIndex++) {   current_tab = rel->tab[tabIndex];   for(i = 0; i < current_tab->num_join; i++ ) { for ( j=0; j<current_tab>join_info[i].num_join; j++ ) {  foreignTab =findTabByName(rel, current_tab->join_info[i].foreign[j].tab);  if(foreignTab->level < current_tab->level) {strcpy(current_tab->foreign_key.col,current_tab->join_info[i].foreign[j].col); Find k such thatforeignTab->col2xml[k].col  equals current_tab->foreign_key.col;current_tab->foreign_key.xml =  foreignTab->col2xml[k].xml;current_tab->foreign_key.xmlType =  foreignTab->col2xml[k].xmlType;current_tab->foreign_key.dataType =  foreignTab->col2xml[k].dataType;  }}   }  } }

[1023] G.3.2.2 Process RDB_node

[1024] The following routine will process a RDB_node.

[1025] process_RDB node(rel,dom_node, rdb_node, dxx_type)process_RDB_node(rel,dom_node, rdb_node, dxx_type)  {  workingvariables: DXX_TAB *mytab; int i; /* It is attribute_node, text_node, ora leaf element_node in DAD. */ /* Note: table must be specified inRDB_node */  mytab = rel->tab[?] such that mytab.name = table ofRDB_node  if no mytab found return error;  i = mytab->num_col; mytab->col2xml[i].col=column of RDB_node;  mytab->col2xml[i].xmlLevel =current_level;  if ( dxx_type == DXX_DECOMPOSE ) {mytab->col2xml[i].dataType= type of RDB_node; }  if ( it is anattribute_node ) { mytab->col2xml[i].xml=name of the attribute if(dxx_type == DXX_DECOMPOSE)  mytab->col2xml[i].xmlType = DXX_ATTRIBUTE; }  else if (it is a text_node) { mytab->col2xml[i].xml=name of theparent element_node; if (dxx_type == DXX_DECOMPOSE) mytab->col2xml[i].xmlType = DXX_TEXT;  } else { /* It is a leafelement_node in DAD. */ mytab->col2xml[i].xml=name of element_node; if(dxx_type == DXX_DECOMPOSE)  mytab->col2xml[i].xmlType = DXX_ELEMENT;  } mytab->num_col++;  if ( condition != NULL ) add condition tomytab->condition;

[1026] Definition of a Qualifying Parent:

[1027] The parent p of an element qualifies for a table t if all of thefollowing four conditions are met;

[1028] p exists.

[1029] p is intermediate.

[1030] p has multi_occurrence==YES

[1031] p does not have any child mapped to a table other than t.

[1032] Intuitively, if a parent qualifies for a table, it is a candidateto be chosen as the top-element of the table.

[1033] Definition of Top-Element:

[1034] Let e be the highest element in DAD that belongs to a table t.

[1035] If e does not have a parent qualifying for t, then e is thetop-element of t.

[1036] Otherwise, the top-element of t is the highest element in theparent chain of e that does not have a parent qualifying for t.findTopElement(rel,dom_node, rdb_node, dxx_type)findTopElement(rel,dom_node, rdb_node, dxx_type)  { working variables:DXX_TAB *mytab; int i; /* It is attribute_node, text_node, or a leafelement_node in DAD. */ /* Note: table must be specified in RDB_node */mytab = rel->tab[?] such that mytab.name = table of RDB_node if no mytabfound return error; if ( mytab->top_element == NULL ) { parent_node =parent element_node of this dom_node; grandparent_node = parent of theparent_node; while ( grandparent_node exists && grandparent_node isintermediate && grandparent_node has multiple occurrence &&grandparent_node does not have children mapped to other tables ) {parent_node = grandparent_node; grandparent_node = parent of thegrandparent_node; } mytab->top_element = name of the parent_node; } } /*findTopElements finds the top-element for each table in rel. */ /*Because of a problem raised by Dorine Yelton, we can no longer find thetop element during process_RDB_node in process_rel1. */ /*findTopElements assumes that process_rel1 has been called. */findTopElements(rel, dom_node, dxx_type) { if (dom_node type ==element_node) { rel->current_level++; child = DOMGetFirstChild; while (child != NULL ) {  findTopElements(rel,child,dxx_type);  child =DOMGetNextchild; } } if (type == attribute_node ∥ type == text_node ∥(dom_node is a leaf element node && DXX_DECOMPOSE)) { call DOMgetChildto get the RDB_node of dom_node; if ( there is no RDB_node ) returnerror; call findTopElement(rel,dom_node, RDB_node,dxx_type); } if(dom_node type == element_node) rel->current_level--; } Example:rel->tab[0].name = “order_tab”; rel->tab[0].primary_key_name =“order_key”; rel->tab[0].foreign_key.col = “”; rel->tab[0].top_element =“Order”; rel->tab[0].selectDaPtr = NULL; rel->tab[0].num_col = 3;rel->tab[0]->col2xml[0].xml= “Key”;rel->tab[0]->col2xml[0].col=“order_key”;rel->tab[0]->col2xml[0].dataType=“integer”;rel->tab[0]->col2xml[0].xmlType=DXX_ATTRIBUTE;rel->tab[0]->col2xml[0].xmlLevel=0; rel->tab[0]->col2xml[1].xml= “name”;rel->tab[0]->col2xml[1].col=“customer_name”;rel->tab[0]->col2xml[1].dataType=“char(128)”;rel->tab[0]->col2xml[1].xmlType=DXX_TEXT;rel->tab[0]->col2xml[1].xmlLeve1=2; rel->tab[0]->col2xml[2].xml=“email”; rel->tab[0]->col2xml[2].col=“customer_email”;rel->tab[0]->col2xml[2].dataType=“char(128)”;rel->tab[0]->col2xml[2].xmlType=DXX_TEXT;rel->tab[0]->col2xml[2].xmlLevel=2; rel->tab[0].num_join = 1;rel->tab[0]->join_info[0]->col = “order_key”;rel->tab[0]->join_info[0]->num_join = 1;rel->tab[0]->join_info[0]->foreign[0].col = “o_key”;rel->tab[0]->join_info[0]->foreign[0].tab = “part_tab”;rel->tab[0]->join_info[0]->foreign[0].indexInSQLDA = DXX_NONE;rel->tab[0]->condition = NULL; rel->tab[1].name = “part_tab”;rel->tab[1].primary_key_name = “part_key”; rel->tab[1].foreign_key.col =“order_key”; rel->tab[1].foreign_key.dataType = “integer”;rel->tab[1].foreign_key.xml = “Key”; rel->tab[1].foreign_key.xmlType =DXX_ATTRIBUTE; rel->tab[1].top_element = “Part”; rel->tab[1].selectDaPtr= NULL; rel->tab[1].num_col = 5; rel->tab[1]->col2xml[0].xml = “Color”;rel->tab[1]->col2xml[0].col = “Color”;rel->tab[1]->col2xml[0].dataType=“char(5)”;rel->tab[1]->col2xml[0].xmlType=DXX_ATTRIBUTE;rel->tab[1]->col2xml[0].xmlLevel=1; rel->tab[1]->col2xml[1].xml= “Key”;rel->tab[1]->col2xml[1].col=“part_key”;rel->tab[1]->col2xml[1].dataType=“integer”;rel->tab[1]->col2xml[1].xmlType=DXX_TEXT;rel->tab[1]->col2xml[1].xmlLevel=2; rel->tab[1]->col2xml[2].xml=“Quantity”; rel->tab[1]->col2xml[2].col=“qty”;rel->tab[1]->col2xml[2].dataType=“integer”;rel->tab[1]->col2xml[2].xmlType=DXX_TEXT;rel->tab[1]->col2xml[2].xmlLevel=2; rel->tab[1]->col2xml[3].xml=“ExtendedPrice”; rel->tab[1]->col2xml[3].col=“price”;rel->tab[1]->col2xml[3].dataType=“rea1”;rel->tab[1]->col2xml[3].xmlType=DXX_TEXT;rel->tab[1]->col2xml[3].xmlLevel=2; rel->tab[1]->col2xml[4].xml= “Tax”;rel->tab[1]->col2xml[4].col=“Tax”;rel->tab[1]->col2xml[4].dataType=“real”;rel->tab[1]->col2xml[4].xmlType=DXX_TEXT;rel->tab[1]->col2xml[4].xmlLevel=2; rel->tab[1].num_join = 2;rel->tab[1]->join_info[0]->col = “o_key”;rel->tab[1]->join_info[0]->type = “integer”;rel->tab[1]->join_info[0]->num_join = 1;rel->tab[1]->join_info[0]->foreign[0].col = “order_key”;rel->tab[1]->join_info[0]->foreign[0].tab = “order_tab”;rel->tab[1]->join_info[0]->foreign[0].indexInSQLDA = DXX_NONE;rel->tab[1]->join_info[1]->col = “part_key”;rel->tab[1]->join_info[1]->type = “integer”;rel->tab[1]->join_info[1]->num_join = 1;rel->tab[1]->join_info[1]->foreign[0].col = “p_key”;rel->tab[1]->join_info[1]->foreign[0].tab = “ship_tab”;rel->tab[1]->join_info[1]->foreign[0].indexInSQLDA = DXX_NONE;rel->tab[1]->condition = “price > 2500.00”; rel->tab[2].name =“ship_tab”; rel->tab[2].primary_key_name = “ship_date”;rel->tab[2].foreign_key.col = “part_key”;rel->tab[2].foreign_key.dataType = “integer”;rel->tab[2].foreign_key.xml = “Key”; rel->tab[2].foreign_key.xmlType =DXX_TEXT; rel->tab[2].top_element = “Shipment”; rel->tab[2].selectDaPtr= NULL; rel->tab[2].num_col = 2; rel->tab[2]->col2xml[0].xml=“ShipDate”; rel->tab[2]->col2xml[0].col=“date”;rel->tab[2]->col2xml[0].dataType=“date”;rel->tab[2]->col2xml[0].xmlType=DXX_TEXT;rel->tab[2]->col2xml[0].xmlLevel=3; rel->tab[2]->col2xml[1].xml=“ShipMode”; rel->tab[2]->col2xml[1].col=“mode”;rel->tab[2]->col2xml[1].dataType=“char(128)”;rel->tab[2]->col2xml[1].xmlType=DXX_TEXT;rel->tab[2]->col2xml[1].xmlLevel=3; rel->tab[2].num_join = 1;rel->tab[2]->join_info[0]->col = “p_key”;rel->tab[2]->join_info[0]->type = “integer”;rel->tab[2]->join_info[0]->num_join = 1;rel->tab[2]->join_info[0]->forign[0].col = “part_key”;rel->tab[2]->join_info[0]->forign[0].tab = “part_tab”;rel->tab[2]->join_info[0]->foreign[0].indexInSQLDA = DXX_NONE;rel->tab[2]->condition = “date > ‘1966-01-01‘”; rel->num_tables = 3;

[1037] G.3.3 Prepare Tables

[1038] The following routine checks whether the tables in the relstructure exist and creates new tables if they do not exist.prepare_tables(DXX_REL *rel, tablespace)  {   working variables: DXX_TAB*current_tab;   for ( i=0; i < rel->num_tables; i++ ) { current_tab =rel->table[i]; check whether the table current_tab->name exist; if (table exists in database )   check_table(current_tab, tablespace); else  create_table(current_tab); }  }

[1039] G.3.3.1 Check Table

[1040] It is important to ensure that the existing table in the databasecomplies with the DXX_tab data structure in memory.

[1041] check_table(DXX_TAB *tab) { check_table(DXX_TAB *tab) {  /* Localvariables: */  int i;  char mytypename[18];  long mylength;  /* Localhost variables: */  char tabname[8];  char colname[8];  chartypename[18];  long length;  strcpy(tabname, tab->name);  /* Checkcol2xml. */  for (i=0; i<tab->num_col; i++) {   strcpy(colname,tab->col2xml[i].col);   /* Try to get column type and length fromsyscat. */   EXEC SQL SELECT    typename into :typename;    length into:length    FROM syscat.columns    WHERE tabname = :tabname    ANDcolname = :colname;   if (SQLCODE == 100) {    /* Error: Table<mv>tabname</mv>does not have column <mv>colname</mv>. */   }   Breakdown tab->col2xml[i].dataType into mytypename and mylength;   if(mytypename is different from typename) {    /* Error: Column<mv>colname</mv> of <mv>tabname</mv> should have type <mv>typename</mv>.*/   }   if ((length > 0) &&    (mylength > length)) {    /* Error:Column <mv>colname</mv> of <mv>tabname</mv> cannot be longer than<mv>length</mv>. */   }  } /* Check join_info. */  for (i=0;i<tab->num_join; i++) {   strcpy(colname, tab->join_info[i].col);   /*Try to get column type and length from syscat. */   EXEC SQL SELECT   typename into :typename,    length into :length    FROMsyscat.columns    WHERE tabname = :tabname    AND colname = :colname;  if (SQLCODE == 100) {    /* Error: Table <mv>tabname</mv> does nothave column <mv>colname</mv>. */   }  } }

[1042] G.3.3.2 Create New Table for Decomposition

[1043] The following routine creates a new table for decomposition.

[1044] create_table(DXX_TAB *dxx_tab, tablespace){ create_table(DXX_TAB*dxx_tab, tablespace) {  working variables:  char sql_stmt[];  charcol_stmt[];  char col_type[];  sprintf(sql_stmt,“CREATE table %s(”,dxx_tab->name);  /* create all columns from XML */  for ( i = 0; i <dxx_tab->num_col; i++ ){ init col_stmt; if ( dxx_tab->col2xml[i].col =dxx_tab->primary_key_name ) { /* key */ sprintf(col_stmt,“ %s %s notnull primary key,”, dxx_tab->col2xml[i].col,dxx_tab->col2xml[i].dataType); }  else { sprintf(col_stmt,“ %s %s,”,dxx_tab->col2xml[i].col, dxx_tab->col2xml[i].dataType); } strcat(sql_stmt,col_stmt);  } if (dxx_tab->foreign_key.col[0]) {  /*Add foreign key to sql_stmt. */  sprintf(col_stmt, “ %s %s ”,dxx_tab->join_info[0].col, dxx_tab->foreign_key.dataType); strcat(sql_stmt, col_stmt);  for ( i = 0; i < dxx_tab->num_join; i++ ){init col_stmt; if (dxx_tab->join_info[i].foreign[0].col ==dxx_tab->foreign_key.col) {  sprintf(col_stmt,“REFERENCES %s(%s))”,dxx_tab->join_info[i].foreign[0].tab,dxx_tab->join_info[i].foreign[0].col); }  }  strcat(sql_stmt,col_stmt); }  Execute sql_stmt; } examples: create table order_tab (order_keyinteger not null primary key, customer_name char(32), customer_emailchar(100)); create table part_tab (part_key integer not null primarykey, price real, qty integer, tax real, o_key integer REFERENCESorder_tab(order_key)); create table ship_tab (ship_date date not nullprimary key, ship_mode char(64), comments char(2k), p_key integerREFERENCES part_tab(part_key));

[1045] examples:

[1046] create table order_tab (order_key integer not null primary key,

[1047] customer_name char(32),

[1048] customer_email char(100));

[1049] create table part_tab (part key integer not null primary key,

[1050] price real,

[1051] qty integer,

[1052] tax real,

[1053] o_key integer REFERENCES order_tab(order_key));

[1054] create table ship_tab (ship_date date not null primary key,

[1055] ship_mode char(64),

[1056] comments char(2k),

[1057] p_key integer REFERENCES part tab(part key));

[1058] G.3.3.3 Insert XML Document Into an Enabled Collection

[1059] The following insert will insert an XML document into an enabledcollection.

[1060] dxxInsert(collection_name,xmlobj,sqlstate, msgtext)

[1061] collection_name—the name of enabled XML collection

[1062] xmlobj—the input XML document

[1063] sqlstate———OUT: the SQL state in case of errors

[1064] msgtext—OUT: message text in case of errors

[1065] Note that in the following routine, the sqlstate and msgtext areignored in this embodiment. dxxInsert(collection_name, xmlobj, sqlstate,msgtext) { working variable: DOMElement dad_root_element;DOMNode xml_root_element; DOMNodeList rootXMLNodeList;DXX_REL relation_info; DXX_REL *rel = &relation_info; /* retrieve DADfrom XML_usage table */ select DAD from xml_usage where col_name =collection /* get DAD and parse */ parse DAD; set dad_root_element to bethe root element_node of DAD; /* set relational structure */  setRel(rel,dad,DXX_DECOMPOSE); /* process relational information */  process_rel(rel,dad_root_element,0,DXX_DECOMPOSE); /* parse the xmlobjto get the DOM tree of XMLobj */ parse xmlobj to get the xmldom of XMLobj; xml_root_element = the root element of xmldom; create arootXMLNodeList, which has only one node: xml_root_element; /* decomposeXML obj */ decompose_element(rel,rootXMLNodeList,dad_root_element); }

[1066] G.3.3.4 Decompose Element

[1067] The following routine decomposes an element. decompose_element(DXX_REL *rel, DOMNodeList *xml_element_nodes, DOMElement*dad_element_node) {  working variables:   char sql_stmr[]; /* stmt forinsert */ char pre_insrt_stmt[]; /* stmt for insert */ chardad_element_name[]; DXX_TAB * current_tab; DXX_ROWS dxx_row; DXX_ROWS *dxx_rows = &dxx_row; char row[]; HSTMT hstmt; 1. call DOMgetAttribute toget element_name from dad_element_node; 2. find current_tab =rel−>table[i] such that rel−>table[i].top_element = element_name;  ifnot found then current_tab = rel−>table[i] such thatcurrent_tab−>col2xml[?]1.xml = element_name if (current_tab not found) {/* Call decompose_element on each children. */ for each child node ofthe dad_element_node do  if (the child node is an element_node) {  child_name = child_dad_element_node's name:   childXMLNodes = NULL;  call dxx_getNodeList (xml_element_nodes, child_name, childXMLNodes);  call decompose_element (rel, childXMLnodes, child_dad_element_node); } } else { 3. if (current_tab−>sql_stmt[0] == ‘\0’) {  /* Generate theSQL insertion stmt with parameter markers. */  call generate_SQLi(current_tab, pre_insert_stmt) ; else  strcpy (pre_insert_stmt,current_tab−>sql_stmt) ; 4. /* Fill the rows with the data fromxml_element_nodes */  call gen_rows (rel, current_tab,xml_element_nodes, dad_element_node. dxx_rows) ; 5. /* insert the rows*/  CLI_AllocHandle (SQL_HANDLE_STMT, rel−>hdbc, &hstmt) ; CLI_Prepare(hstmt, sql_stmt) ; for ( i = 0; i < dxx_rows−>num_rows; i++) {   initsql_stmt;   dxxrow_to_stmt (pre_insert_stmt. dxx_rows−>row[i],current_tab−>col2xml, hstmt);   CLI_Execute (hstmt) ;   Freedxx_rows−>row[i];  }  Free dxx_rows−>row; CLI_FreeHandle (hstmt) ; 6.for each child node of the dad_element_node do { if( (the child node isan element_node ) AND (current_tab−>top_element != child_name) AND(current_tab−>col2xml[?].xml != child_name)) { child_name =child_dad_element_node's name; childXMLNodes = NULL; calldxx_getNodeList (xml_element_nodes, child_name, childXMLNodes) ; calldecompose_element (rel, childXMLnodes, child_dad_element_node) ; else /*covered by this one */  ; /* do nothing */ }  } /* end if (current_tabnot found) */ }

[1068] G.3.3.4.1 Get Node List

[1069] The following routine returns a list of nodes with a specifiedelement name from a right parent chain. dxx_getNodeList(DOM_NodeListxml_element_nodes, char *child_name, DOM_NodeList *returnNodeList) { /*Local variables: */ int n; int i; n = xml_element_nodes.getLength(); for(i=0; i<n; i++) { xml_element = xml_element_nodes.item(i); children =xml_element.getChildren(); for each xml_child in the children do {  if (xml_child_name == child_name ) {  add xml_child to returnNodeList; found = TRUE;  } } if ( not found ) {dxx_getNodeList(children,element_name,returnNodeList); } } /* end of fori */ return mynodeList; }

[1070] G.3.3.4.2 Generate SQL

[1071] The following routine generates an SQL insertion statement withparameter markers. generate_SQLi(DXX_TAB *current_tab, /* input */ char*sql_stmt) /* output */ { working variables: char[] insert_stmt; /*Generate the first part of the statement,  such as: INSERT INTO t1(n,x,c)*/ sprintf(insert_stmt,“insert into %s (”,current_tab—>name); for(i=0; i< current_tab—>num_col; i++)  sprintf(insert_stmt,“%s”,current_tab—>col2xm[i].col); } /* Identify columns as the foreignkey to parent table*/ for (i=0; i<current_tab—>for_key.num_col; i++) { strcat(insert_stmt, current_tab—>join_info[i].col); strcat(insert_stmt, “, ”); } Remove the last comma from insert_stmt;strcat(insert_stmt, “)”); strcpy(sql_stmt, insert_stmt); /* Generate therest of the statement with parm_markers.  For example, a completestatement may look like:   INSERT INTO t1 (n,x,c) SELECT * FROM  TABLE(VALUES (CAST(? AS INTEGER),CAST(? AS DOUBLE), CAST(? ASCHAR(5)))) AS t(n,x,c)   WHERE x≧2 */ strcpy(colNarnes,strchr(insert_stmt, ‘(’)); parm_markers[0] =‘\0’; for (i=0;i<current_tab—>num_col; i++) {  strcat(parm_markers, “CAST(? AS ”); strcat(parm_markers, current_tab—>col2xml[i].dataType); strcat(parm_markers, “), ”); } for (1=0;i<current_tab—>for_key.num_col; 1++) {  strcat(parm_markers, “CAST(? AS”);  strcat(parm_markers, current_tab—>for key.col[i].dataType); strcat(parm_markers, “), ”); } Remove the last comma from parm_markers;sprintf(sql_stmt, “%s SELECT * FROM TABLE( VALUES “ “(%s)) AS t%s WHERE%s”, insert_stmt, parm_markers, colNames, current_tab—>condition); }Example: 1. INSERT INTO order_tab (order_key, customer_name,customer_email)  SELECT * FROM TABLE(VALUES (CAST(? AS INTEGER), CAST(?AS  CHAR(128)), CAST(? AS CHAR(128)))) AS  t(order_key, customer_name,customer_email) 2. INSERT INTO part_tab (color, part_key, o_key) SELECT * FROM TABLE(VALUES (CAST(? AS CHAR(10)),CAST(? AS INTEGER),CAST(? AS INTEGER))) AS  t(color, part_key, o_key)  WHERE price >2500.00 3. INSERT INTO ship_tab (ship_date, ship_mode, p_key)  SELECT *FROM TABLE( VALUES (CAST(? AS DATE), CAST(? AS CHAR(128)), CAST(? ASINTEGER))) AS  t(ship_date, ship_mode, p_key)  WHERE date > ‘1966-01-01’

[1072] G.3.3.4.3 Generate Row Data Structure

[1073] The following routine generates the row data structures from thedata of xml_elements. gen_rows(DXX_REL *rel, DXX_TAB *current_tab,DOMNodeList xml_element_nodes, DOMElement dad_element_node, DXX_ROWS*dxxrows) {  working valuble: DXX_ROW *row;  get the element_name bycall DOMGetAttribute of dad_element_node;  /* initialize the internaldata structure */  dxxrows->num_rows = 0;  for(i = 0; i< MAX_NUM_ROWS;i++) dxxrows->row[i] = NULL;  /* loop to work on each instance of thexml_element_node */  for (i = 0; i < xml_element_nodes.length; i++) {xml_element = (element) xml_element_nodes.item(i); call get_rows(rel,current_tab,xml_element,dad_element_node,dxxrows); } }

[1074] G.3.3.4.4 Get Row Data from an XML Element

[1075] The following routine gets the row data from an XML element.get_rows(DXX_REL *rel, DXX_TAB *current_tab, DOMElement xml_element,DOMElement dad_element_node, DXX_ROWDATA *dxxrows) {  callDOMgetChildNodes to get a list of all children of dad_element_node  foreach child_node, do {   if (isChildAttrNode || isChildTextNode ||isChildElementNode) { set isChildInCurrentTab to TRUE, if the child_nodename equals current_tab—>top_element or in its col2xml[]; setisChildTopOfAnotherTab to TRUE if the child_node name equals toanother_tab—>top_element where another_tab != current_tab. Userel—>top_elements for fast search. if (isChildInCurrentTab ||isChildIntermediate || isChildTextNode) { set isChildAttrNode to TRUE ifthe child node is attribute_node; if(isChildAttrNode) {  getattribute_name from child_node  get_coldata(rel,current_tab,dxxrows,DXX_ATTRIBUTE, xml_element,attribute_name); } if(isChildTextNode) { get element_name from child_node; get_coldata(rel,current_tab,dxxrows,DXX_TEXT. xml_element, element_name); } if(isChildElementNode && !isChildTopOfAnotherTab) { get name of the childnode; Construct a single element node list (xml_element_nodes) forxml_element; nodeList = getNodeList(xml_element_nodes,child_element_name); for each node in Nodlist do {  call get_rows(rel,current_tab.child_xml_element, dad_child_element_node, dxxrows);   }  }} else { /* Child is not in current tab. Do nothing. */  }   } } if(there is no significant child && dad_element_node is in current_tab) { /* This dad_element_node is a leaf-node and should be inserted withoutdecomposition. */  get_coldata(rel, current_tab, dxxrows, DXX_ELEMENT,xml_element, element_name);  } }

[1076] G.3.3.4.5 Get Data from Column

[1077] The following routine gets data from a column.

[1078] /* Get data from its column. */ /* Get data from its column. */get_coldata(DXX_REL *rel, DXX_TAB *current_tab, DXX_ROWS *dxxrows, intxmlType, DOMElement xml_element, char *xml_name) {  /* workingvariables: */  int colIndex;  int colType;  int xmlLevel;  DXX_ROW *row; /* find column index */  for (i = 0; i < current_tab—>num_col; i++) {60 if(current_tab—>col2xml[i].xml = xml_name) {  colIndex = i;  colType= current_tab—>col2xml[i].dataType;  xmlLevel =current_tab—>col2xml[i].xmlLevel;  break;  } } if(xmlType ==DXX_ATTRIBUTE) {  data = xml_element.DOMGetAttribute(Xml_name); } elseif(xmlType == DXX_TEXT) {  xml_text_nodelist = the text child nodes ofxml_element; Concatenate all the text from xml_text_nodelist and set itto xml_text_node; data = xml_text_node.getNodeValue(); }else { /*xmlType == DXX_ELEMENT */ data = serialize xml_element as XML; } length= strlen(data); if(colType is char, varchar, date, time, or timestamp) { /* Enclose the data with single-quotes. */  check length with colType; length+2; } if (dxx_rows—>num_rows == 0 ) {  allocate dxx_rows—>row[0]; dxx_rows—>num_rows++;  row = dxx_rows—>row[0];  row—>num_col =current_tab—>num_col;  allocate row—>coldata   as an array of sizerow—>num_col;  memset row—>coldata[...] to NULL; get_foreign_key_value(rel, xml_element, current_tab—>foreion_key,&(row—>coldata[row—>num_col−1])); } else {  previous_row =dxx_rows[dxx_rows—>num_rows−1];  if (previous_row—>coldata[colIndex] ==NULL) {  /* we know that it is not filled */  row = previous_row; } else{  /* previous_row's coldata is filled */  allocatedxx_rows[dxx_rows—>num_rows]  row = dxx_rows[dxx_rows—>num_rows]; row—>num_col = current_tab—>num_col;  allocate row—>coldata   as anarray of size row—>num_col; copy_row(row,previous_row,colIndex,current_tab); get_foreign_key_value(rel, xml_element, current_tab—>foreign_key,&(row—>coldata[row—>num_col−1])); dxx_rows—>num_rows++;   } }row—>coldata[colIndex]=malloc(length+1); copy data to row—>coldata[i],add single-quotes if needed }

[1079] G.3.3.4.6 Get Foreign Key Values

[1080] /* Get foreign key value by climbing the DOM tree. */ /* Getforeign key value by climbing the DOM tree. */ getforeign_key_value(DXX_REL *rel, DOM_Element xml_element, DXX_COL2XML*foreign_key, char **foreion_key_value) /* OUT */. { /* workingvariables: */ DOM_Element parent; char  *data; DOM_NodeList nl;DOM_Element e; parent = xml_element.getParentNode(); while (parent is inthe same table as xml_element || parent is an intermediate node) {parent = parent.getParentNode(); } switch(foreign_key—>xmlType) { caseDXX_ATTRIBUTE:  data = parent.getAttribute(foreign_key—>xml);  break;case DXX_TEXT:  nl = parent.getElementsByTagName(foreign_key—>xml);  e =nl.item(0); /* the first element whose name is foreion_key—>xml */  data= e.getFirstChild().getNodeValue();  break; default: /*error: unknownforeign_key_xmlType */ }/* end of switch(foreign_key_xmlType) */Allocate *foreign_key_value; Copy data into * foreign key value and enclose the data with single-quotes if needed  according to foreignkey—>dataType; } /* Copy the columns of a row down to the current level.*/ copy_row(DXX_ROW *row, DXX_ROW *previous_row, int colIndex, DXX_TAB*current_tab) { num_cols = previous_row—>num_cols +(current_tab—>foreign_key.col[0])? 1: 0; for(i = 0; i < num_cols;i++) { previousLevel = current_tab—>col2xml[i].xmlLevel; if(previous_row—>coldata[i] != NULL && i !=colIndex   && previousLevel≦ current_tab—>level) {   row—>coldata[i]strdup(previous_row—>coldata[i]);  } else {   row—>coldata[i] = NULL;  } } }

[1081] G.3.3.4.7 Insert Bind

[1082] The following routine binds the parameters of an INSERT statementwith the data in a row structure using CLI. dxxrow_to_stmt(char *prefix,DXX_ROW *row, COL2XML *col2xml, SQLHSTMT hstmt) {  /* Local variables:*/  CLI_type. CLI_size, CLI_digits, bufLen;  int i;  for (i=0;i<row—>num_col; i++) {  Get the CLI parameters CLI_type, CLI_size, andCLI_digits.  CLI_BindParameter(hstmt, i+l, SQL_PARAM_INPUT.SQL_C_CHAR.CLI_type, CLI_size CLI_digits, row—>coldata[i], bufLen, NULL);  } }

[1083] G.3.3.5 Shred XML Document into DB2 Databases

[1084] The following routine shreds an XML document into a DB2 database.The stored procedure dxxShredXML( ) works the same as dxxInsertXML( )except that it takes a DAD as the first input parameter instead of aname of an enabled XML collection. Therefore, it can be called withoutenabling an XML collection.

[1085] The stored procedure dxxShredXML( ) inserts an input XML documentinto an enabled XML collection according to the Xcollectionspecification in the input DAD. If the tables used in the Xcollection ofthe DAD do not exist or the columns do not meet the data types specifiedin the DAD mapping, an error will be returned. The stored proceduredxxShredXML( ) decomposes the input XML document and inserts fragmentedXML data into the tables specified in the DAD.

[1086] Stored Procedure Declaration

[1087] dxxShredXML(char(128) DAD_buf, /* input */

[1088] CLOB xmlobj, /* input */

[1089] long *errCode, /* output */

[1090] varchar(1024) *errMsg) /* output */

[1091] Parameters:

[1092] DAD_buf: a buffer containing the DAD,

[1093] xmlobj: IN, an XML document object in XMLCLOB type.

[1094] errCode: OUT, return code in case of error,

[1095] errMsg:OUT, message text in case of error.

[1096] The following is an example of the dxxShredXML( ) call. EXEC SQLINCLUDE SQLCA; EXEC SQL BEGIN DECLARE SECTION;  char DAD_buf[128]; /*name of an XML collection */  SQL TYPE is CLOB_FILE xmlDoc;/* input XMLdocument */  long errCode; /* error code */  char errMsg[1024]; /* errormessage text */  short dad_ind  short xmlDoc_ind;  short errCode_ind; short errMsg_ind;  EXEC SQL END DECLARE SECTION;  /*....... suppose theptrBuf has DAD content */  /* initialize host variable and indicators */ strcpy(DAD_buf,ptrBuf);  strcpy(xmlobj.name,“e:\xml\order1.xml”); xmlobj.name_length=strlen(“e:\xml\order1.xml”); xmlobj.file_option=SQL_FILE_READ;  errCode = 0;  errMsg[0] =‘\0’; dad_ind = 0;  xmlobj_ind = 0;  errCode_ind = −1;  errMsg_ind = −1;  /*Call the stored procedure */  EXEC SQL CALLdb2xml!dxxShredXML(:DAD_buf:dad_ind; :xmlobj:xmlob_ind,:errCode:errCode_ind,:errMsg:errMsg_ind);

[1097] If the content of DAD_buf has the Litem_DAD3.dad content, thenthe dxxShredXML( ) call will decompose the input XML document“e:\xml\order1.xml” and insert data into the sales_order collectiontables.

CONCLUSION

[1098] This concludes the description of an embodiment of the invention.The following describes some alternative embodiments for accomplishingthe present invention. For example, any type of computer, such as amainframe, minicomputer, or personal computer. or computerconfiguration, such as a timesharing mainframe, local area network, orstandalone personal computer, could be used with the present invention.

[1099] The foregoing description of an embodiment of the invention hasbeen presented for the purposes of illustration and description. It isnot intended to be exhaustive or to limit the invention to the preciseform disclosed. Many modifications and variations are possible in lightof the above teaching. It is intended that the scope of the invention belimited not by this detailed description, but rather by the claimsappended hereto.

What is claimed is:
 1. A method of locating data in a data storeconnected to a computer, the method comprising the steps of: creating amain table having a column for storing a document, wherein the documenthas one or more elements or attributes; creating one or more sidetables, wherein each side table stores one or more elements orattributes; and using the side tables to locate data in the main table.2. The method of claim 1, wherein the document in the column is anextensible markup language document.
 3. The method of claim 1, whereinone or more side tables are created after the column for storing thedocument is enabled.
 4. The method of claim 1, further comprisinggenerating the side tables using a data access definition.
 5. The methodof claim 4, further comprising providing a graphical user interface toenable a user to create the data access definition.
 6. The method ofclaim 1, further comprising converting the elements or attributes to SQLdata types.
 7. The method of claim 1, further comprising generating oneor more triggers to provide synchronization between the main table andside tables.
 8. The method of claim 7, wherein a trigger is activatedupon data being inserted into the column for storing a document.
 9. Themethod of claim 7, wherein a trigger is activated upon data beingmodified in the column for storing a document.
 10. The method of claim1, wherein data is located using a location path.
 11. The method ofclaim 1, further comprising creating an index on each side table. 12.The method of claim 11, wherein locating data in the main table furthercomprises: searching for the data in the side tables using the indexes;and mapping data located in the side tables to data in the main table.13. The method of claim 1, further comprising searching from a joinview.
 14. The method of claim 1, further comprising receiving a query onone or more side tables and using the side tables to locate data in thequeried side tables.
 15. The method of claim 1, further comprising usingan extracting user-defined function to locate data.
 16. The method ofclaim 1, further comprising searching on an element or attribute withmultiple occurrences.
 17. The method of claim 1, further comprisingperforming a text search on the document.
 18. The method of claim 1,further comprising performing a range search.
 19. The method of claim 1,further comprising enabling the column.
 20. The method of claim 1,further comprising disabling the column.
 21. An apparatus for locatingdata in a data store, comprising: a computer having a data store coupledthereto, wherein the data store stores data; and one or more computerprograms, performed by the computer, for creating a main table having acolumn for storing a document, wherein the document has one or moreelements or attributes, for creating one or more side tables, whereineach side table stores one or more elements or attributes, and for usingthe side tables to locate data in the main table.
 22. The apparatus ofclaim 21, wherein the document in the column is an extensible markuplanguage document.
 23. The apparatus of claim 21, wherein one or moreside tables are created after the column for storing the document isenabled.
 24. The apparatus of claim 21, further comprising generatingthe side tables using a data access definition.
 25. The apparatus ofclaim 24, further comprising providing a graphical user interface toenable a user to create the data access definition.
 26. The apparatus ofclaim 21, further comprising converting the elements or attributes toSQL data types.
 27. The apparatus of claim 21, further comprisinggenerating one or more triggers to provide synchronization between themain table and side tables.
 28. The apparatus of claim 27, wherein atrigger is activated upon data being inserted into the column forstoring a document.
 29. The apparatus of claim 27, wherein a trigger isactivated upon data being modified in the column for storing a document.30. The apparatus of claim 21, wherein data is located using a locationpath.
 31. The apparatus of claim 21, further comprising creating anindex on each side table.
 32. The apparatus of claim 31, whereinlocating data in the main table further comprises: searching for thedata in the side tables using the indexes; and mapping data located inthe side tables to data in the main table.
 33. The apparatus of claim21, further comprising searching from a join view.
 34. The apparatus ofclaim 21, further comprising receiving a query on one or more sidetables and using the side tables to locate data in the queried sidetables.
 35. The apparatus of claim 21, further comprising using anextracting user-defined function to locate data.
 36. The apparatus ofclaim 21, further comprising searching on an element or attribute withmultiple occurrences.
 37. The apparatus of claim 21, further comprisingperforming a text search on the document.
 38. The apparatus of claim 21,further comprising performing a range search.
 39. The apparatus of claim21, further comprising enabling the column.
 40. The apparatus of claim21, further comprising disabling the column.
 41. An article ofmanufacture comprising a program storage medium readable by a computerand embodying one or more instructions executable by the computer toperform method steps for locating data in a data store connected to thecomputer, the method comprising the steps of: creating a main tablehaving a column for storing a document, wherein the document has one ormore elements or attributes; creating one or more side tables, whereineach side table stores one or more elements or attributes; and using theside tables to locate data in the main table.
 42. The article ofmanufacture of claim 41, wherein the document in the column is anextensible markup language document.
 43. The article of manufacture ofclaim 41, wherein one or more side tables are created after the columnfor storing the document is enabled.
 44. The article of manufacture ofclaim 41, further comprising generating the side tables using a dataaccess definition.
 45. The article of manufacture of claim 44, furthercomprising providing a graphical user interface to enable a user tocreate a data access definition.
 46. The article of manufacture of claim41, further comprising converting the elements or attributes to SQL datatypes.
 47. The article of manufacture of claim 41, further comprisinggenerating one or more triggers to provide synchronization between themain table and side tables.
 48. The article of manufacture of claim 47,wherein a trigger is activated upon data being inserted into the columnfor storing a document.
 49. The article of manufacture of claim 47,wherein a trigger is activated upon data being modified in the columnfor storing a document.
 50. The article of manufacture of claim 41,wherein data is located using a location path.
 51. The article ofmanufacture of claim 41, further comprising creating an index on eachside table.
 52. The article of manufacture of claim 51, wherein locatingdata in the main table further comprises: searching for the data in theside tables using the indexes; and mapping data located in the sidetables to data in the main table.
 53. The article of manufacture ofclaim 41, further comprising searching from a join view.
 54. The articleof manufacture of claim 41, further comprising receiving a query on oneor more side tables and using the side tables to locate data in thequeried side tables.
 55. The article of manufacture of claim 41, furthercomprising using an extracting user-defined function to locate data. 56.The article of manufacture of claim 41, further comprising searching onan element or attribute with multiple occurrences.
 57. The article ofmanufacture of claim 41, further comprising performing a text search onthe document.
 58. The article of manufacture of claim 41, furthercomprising performing a range search.
 59. A method of transforming datastored on a data storage device that is connected to a computer, themethod comprising: receiving a query that selects data in the datastorage device; retrieving the selected data into a work space; andgenerating one or more XML documents to consist of the selected data.60. The method of claim 59, wherein the work space comprises a tablehaving one or more columns and wherein the one or more XML documents aregenerated by mapping each column to an element or attribute of one ofthe XML documents.
 61. The method of claim 59, wherein the one or moreXML documents are generated using a data access definition.
 62. Themethod of claim 61, further comprising using, a document type definitionto validate the one or more XML documents.
 63. The method of claim 61,further comprising using a document type definition to prepare thedocument access definition.
 64. The method of claim 61, wherein thedocument access definition further comprises an Extensible MarkupLanguage Path data model based definition of the one or more XMLdocuments to be generated.
 65. The method of claim 59, wherein the workspace comprises a table and further comprising mapping column names of atable to equivalence classes.
 66. The method of claim 65, wherein theequivalence classes are defined by a user. 67 The method of claim 65,wherein the equivalence classes are defined by a heuristic approach. 68.The method of claim 59, further comprising removing duplicates from theselected data.
 69. The method of claim 59, wherein generating one ormore XML documents comprises using an Xcollection definition thatdefines how to compose the one or more XML documents from the retrievedselected data.
 70. The method of claim 68, wherein the Xcollectiondefinition is contained in a data access definition.
 71. The method ofclaim 68, wherein the Xcollection definition comprises an SQL_queryelement.
 72. The method of claim 59, further comprising, prior toretrieving data, parsing a document access definition.
 73. The method ofclaim 59, wherein the one or more XML documents are generated using aquery, XML composition stored procedures, and a document accessdefinition.
 74. The method of claim 59, wherein the data to generate oneor more XML documents is stored in an XML collection.
 75. The method ofclaim 59, wherein the one or more XML documents are shared betweenbusinesses.
 76. The method of claim 59, wherein the one or more XMLdocuments are generated b y stored procedures.
 77. The method of claim59, wherein the stored procedures can be called from database clientcode.
 78. An apparatus for transforming data, comprising: a computerhaving a data store coupled thereto, wherein the data store stores data;and one or more computer programs, performed by the computer, forreceiving a query that selects data in the data storage device,retrieving the selected data into a work space, and generating one ormore XML documents to consist of the selected data.
 79. The apparatus ofclaim 78, wherein the work space comprises a table having one or morecolumns and wherein the one or more XML documents are generated bymapping each column to an element or attribute of one of the XMLdocuments.
 80. The apparatus of claim 78, wherein the one or more XMLdocuments are generated using a data access definition.
 81. Theapparatus of claim 80, further comprising using a document typedefinition to validate the one or more XML documents.
 82. The apparatusof claim 80, further comprising using a document type definition toprepare the document access definition.
 83. The apparatus of claim 80,wherein the document access definition further comprises an ExtensibleMarkup Language Path data model based definition of the one or more XMLdocuments to be generated. 84 The apparatus of claim 78, wherein thework space comprises a table and further comprising creating mappingcolumn names of a table to equivalence classes.
 85. The apparatus ofclaim 84, wherein the equivalence classes are defined by a user.
 86. Theapparatus of claim 84, wherein the equivalence classes are defined by aheuristic approach.
 87. The apparatus of claim 78, further comprisingremoving duplicates from the selected data.
 88. The apparatus of claim78, wherein generating one or more XML documents comprises using anXcollection definition that defines how to compose the one or more XMLdocuments from the retrieved selected data.
 89. The apparatus of claim88, wherein the Xcollection definition is contained in a data accessdefinition.
 90. The apparatus of claim 88, wherein the Xcollectiondefinition comprises an SQL query element.
 91. The apparatus of claim78, further comprising, prior to retrieving data, parsing a documentaccess definition.
 92. The apparatus of claim 78, wherein the one ormore XML documents are generated using a query, XML composition storedprocedures, and a document access definition.
 93. The apparatus of claim78, wherein the data to generate one or more XML documents is stored inan XML collection.
 94. The apparatus of claim 78, wherein the one ormore XML documents are shared between businesses.
 95. The apparatus ofclaim 78, wherein the one or more XML documents are generated by storedprocedures.
 96. The apparatus of claim 78, wherein the stored procedurescan be called from database client code.
 97. An article of manufacturecomprising a program storage medium readable by a computer and embodyingone or more instructions executable by the computer to perform methodsteps for transforming data in a data store connected to the computer,the method comprising the steps of: receiving a query that selects datain the data storage device; retrieving the selected data into a workspace; and generating one or more XML documents to consist of theselected data.
 98. The article of manufacture of claim 97, wherein thework space comprises a table having one or more columns and wherein theone or more XML documents are generated by mapping each column to anelement or attribute of one of the XML documents.
 99. The article ofmanufacture of claim 97, wherein the one or more XML documents aregenerated using a data access definition.
 100. The article ofmanufacture of claim 99, further comprising using a document typedefinition to validate the one or more XML documents.
 101. The articleof manufacture of claim 99, further comprising using a document typedefinition to prepare the document access definition.
 102. The articleof manufacture of claim 99, wherein the document access definitionfurther comprises an Extensible Markup Language Path data model baseddefinition of the one or more XML documents to be generated.
 103. Thearticle of manufacture of claim 97, further comprising mapping columnnames of a table to equivalence classes.
 104. The article of manufactureof claim 103, wherein the equivalence classes are defined by a user.105. The article of manufacture of claim 103, wherein the equivalenceclasses are defined by a heuristic approach.
 106. The article ofmanufacture of claim 97, further comprising removing duplicates from theselected data.
 107. The article of manufacture of claim 97, whereingenerating one or more XML documents comprises using an an Xcollectiondefinition that defines how to compose the one or more XML documentsfrom the retrieved selected data.
 108. The article of manufacture ofclaim 107, wherein the Xcollection definition is contained in a dataaccess definition.
 109. The article of manufacture of claim 107, whereinthe Xcollection definition comprises an SQLquery element.
 110. Thearticle of manufacture of claim 97, further comprising, prior toietrieving data, parsing a document access definition.
 111. The articleof manufacture of claim 97, wherein the one or more XML documents aregenerated using a query, XML composition stored procedures, and adocument access definition.
 112. The article of manufacture of claim 97,wherein the data to generate one or more XML documents is stored in anXML collection.
 113. The article of manufacture of claim 97, wherein theone or more XML documents are shared between businesses.
 114. Thearticle of manufacture of claim 97, wherein the one or more XMLdocuments are generated by stored procedures.
 115. The article ofmanufacture of claim 97, wherein the stored procedures can be calledfrom database client code.
 116. A method of transforming data stored ona data storage device that is connected to a computer, the methodcomprising: generating a document object model tree using a documentaccess definition: traversing the document object model tree to obtaininformation to retrieve relational data; and mapping the relational datato one or more XML documents.
 117. The method of claim 116, wherein thedocument access definition defines a mapping between the relational dataand one or more XML documents.
 118. The method of claim 116, wherein thedocument object model tree comprises one or more relational databasenodes.
 119. The method of claim 118, wherein a relational database nodecomprises an attribute node that maps to a column of a relationaldatabase table.
 120. The method of claim 118, wherein a relationaldatabase node comprises an element node that maps to a column of arelational database table.
 121. The method of claim 118, wherein arelational database node comprises a text node that maps to a column ofa relational database table.
 122. The method of claim 118, wherein therelational database node identifies a relational table into which XMLdocument data is to be stored.
 123. The method of claim 118, wherein therelational database node identifies a column in a relational table thatcontains XML document data to be retrieved.
 124. The method of claim118, wherein the relational database node identifies one or morepredicates used to select column data from a relational table to storeinto one or more XML documents.
 125. The method of claim 118, whereinthe relational database node identifies a join relationship for joiningmultiple tables.
 126. The method of claim 125, wherein the relationaldatabase node identifies a primary and foreign key relationship for thejoin relationship.
 127. The method of claim 116, further comprisinggenerating queries to obtain relational data using the document objectmodel tree.
 128. The method of claim 116, wherein the relational datacomprises an attribute value to be written to an XML document.
 129. Themethod of claim 116, wherein the relational data comprises element textto be written to an XML document.
 130. The method of claim 116, furthercomprising a stored procedure that receives the document accessdefinition and outputs a table populated with the one or more XMLdocuments.
 131. An apparatus for transforming data, comprising: acomputer having a data store coupled thereto, wherein the data storestores data; and one or more computer programs, performed by thecomputer, for generating a document object model tree using a documentaccess definition, traversing the document object model tree to obtaininformation to retrieve relational data, and mapping the relational datato one or more XML documents.
 132. The apparatus of claim 131, whereinthe document access definition defines a mapping between the relationaldata and one or more XML documents.
 133. The apparatus of claim 131,wherein the document object model tree comprises one or more relationaldatabase nodes.
 134. The apparatus of claim 133, wherein a relationaldatabase node comprises an attribute node that maps to a column of arelational database table.
 135. The apparatus of claim 133, wherein arelational database node comprises an element node that maps to a columnof a relational database table.
 136. The apparatus of claim 133, whereina relational database node comprises a text node that maps to a columnof a relational database table.
 137. The apparatus of claim 133, whereinthe relational database node identifies a relational table into whichXML document data is to be stored.
 138. The apparatus of claim 133,wherein the relational database node identifies a column in a relationaltable that contains XML document data to be retrieved.
 139. Theapparatus of claim 133, wherein the relational database node identifiesone or more predicates used to select column data from a relationaltable to store into one or more XML document s.
 140. The apparatus ofclaim 133, wherein the relational database node identifies a joinrelationship for joining multiple tables.
 141. The apparatus of claim140, wherein the relational database node identifies a primary andforeign key relationship for the join relationship.
 142. The apparatusof claim 131, further comprising generating queries to obtain relationaldata using the document object model tree.
 143. The apparatus of claim131. wherein the relational data comprises an attribute value to bewritten to an XML document.
 144. The apparatus of claim 131, wherein therelational data comprises element text to be written to an XML document.145. The apparatus of claim 131, further comprising a stored procedurethat receives the document access definition and outputs a tablepopulated with the one or more XML documents.
 146. An article ofmanufacture comprising a program storage medium readable by a computerand embodying one or more instructions executable by the computer toperform steps for transforming data in a data store connected to thecomputer, comprising: generating a document object model tree using adocument access definition; traversing the document object model tree toobtain information to retrieve relational data; and mapping therelational data to one or more XML documents.
 147. The article ofmanufacture of claim 146, wherein the document access definition definesa mapping between the relational data and one or more XML documents.148. The article of manufacture of claim 146, wherein the documentobject model tree comprises one or more relational database nodes. 149.The article of manufacture of claim 148, wherein a relational databasenode comprises an attribute node that maps to a column of a relationaldatabase table.
 150. The article of manufacture of claim
 148. wherein arelational database node comprises an element node that maps to a columnof a relational database table.
 151. The article of manufacture of claim148, wNhlerein a relational database node comprises a text node thatmaps to a column of a relational database table.
 152. The article ofmanufacture of claim 148, wherein the relational database nodeidentifies a relational table into which XML document data is to bestored.
 153. The article of manufacture of claim 148,wherein therelational database node identifies a column in a relational table thatcontains XML document data to be retrieved.
 154. The article ofmanufacture of claim 148, wherein the relational database nodeidentifies one or more predicates used to select column data from arelational table to store into one or more XML documents.
 155. Thearticle of manufacture of claim 148, wherein the relational databasenode identifies a join relationship for joining multiple tables. 156.The article of manufacture of claim 155, wherein the relational databasenode identifies a primary and foreign key relationship for the joinrelationship.
 157. The article of manufacture of claim 146, furthercomprising generating queries to obtain relational data using thedocument object model tree.
 158. The article of manufacture of claim146, wherein the relational data comprises an attribute value to bewritten to an XML document.
 159. The article of manufacture of claim146, wherein the relational data comprises element text to be written toan XML document.
 160. The article of manufacture of claim 146, furthercomprising a stored procedure that receives the document accessdefinition and outputs a table populated with the one or more XMLdocuments.
 161. A method of transforming data stored on a data storethat is connected to a computer, comprising: receiving an XML documentcontaining XML data; receiving a document access definition thatidentifies one or more relational tables and columns; and mapping theXML data to the relational tables and columns using the document accessdefinition.
 162. The method of claim 161, further comprising generatinga first document object model tree using the XML document.
 163. Themethod of claim 162, wherein the first document object model tree isgenerated by parsing the XML document.
 164. The method of claim 161,further comprising generating a second document object model tree usingthe document access definition.
 165. The method of claim 164, whereinthe second document object model tree is generated by parsing thedocument access definition.
 166. The method of claim 161, whereinmapping further comprises: generating a first document object model treeusing data from the XML document; generating a second document objectmodel tree using a document access definition; and mapping the data fromthe first document object model tree into columns in one or morerelational tables using the second document object model tree.
 167. Themethod of claim 161, wherein the XML data is stored untagged into therelational tables.
 168. The method of claim 161, wherein the relationaltables are new tables.
 169. The method of claim 161, wherein therelational tables are existing tables.
 170. The method of claim 161,wherein the mapping is performed by a stored procedure.
 171. Anapparatus for transforming data, comprising: a computer having a datastore coupled thereto, wherein the data store stores data; and one ormore computer programs, performed by the computer, for receiving an XMLdocument containing XML data, receiving a document access definitionthat identifies one or 5 more relational tables and columns, and mappingthe XML data to the relational tables and columns using the documentaccess definition.
 172. The apparatus of claim 171, further comprisinggenerating a first document object model tree using the XML document.173. The apparatus of claim 172, wherein the first document object modeltree is generated by parsing the XML document.
 174. The apparatus ofclaim 171, further comprising generating a second document object modeltree using the document access definition.
 175. The apparatus of claim174, wherein the second document object model tree is generated byparsing the document access definition.
 176. The apparatus of claim 171,wherein mapping further comprises: generating a first document objectmodel tree using data from the XML document: generating a seconddocument object model tree using a document access definition; andmapping the data from the first document object model tree into columnsin one or more relational tables using the second document object modeltree.
 177. The apparatus of claim 171, wherein the XML data is storeduntagged into the relational tables.
 178. The apparatus of claim 171,wherein the relational tables are new tables.
 179. The apparatus ofclaim 171, wherein the relational tables are existing tables.
 180. Theapparatus of claim 171, wherein the mapping is performed by a storedprocedure.
 181. An article of manufacture comprising a program storagemedium readable by a computer and embodying one or more instructionsexecutable by the computer to perform steps for transforming data in adata store connected to the computer, comprising: receiving XML documentcontaining XML data; receiving a document access definition thatidentifies one or more relational tables and columns; and mapping theXML data to the relational tables and columns using the document accessdefinition.
 182. The article of manufacture of claim 181, furthercomprising generating a first document object model tree using the XMLdocument.
 183. The article of manufacture of claim 182, wherein thefirst document object model tree is generated by parsing the XMLdocument.
 184. The article of manufacture of claim 181, furthercomprising generating a second document object model tree using thedocument access definition.
 185. The article of manufacture of claim184. wherein the second document object model tree is generated byparsing the document access definition.
 186. The article of manufactureof claim 181, wherein mapping further comprises: generating a firstdocument object model tree using data from the XML document; generatinga second document object model tree using a document access definition;and mapping the data from the first document object model tree intocolumns in one or more relational tables using the second documentobject model tree.
 187. The article of manufacture of claim 181, whereinthe XML data is stored untagged into the relational tables.
 188. Thearticle of manufacture of claim 181, wherein the relational tables arenew tables.
 189. The article of manufacture of claim 181, wherein therelational tables are existing tables.
 190. The article of manufactureof claim 181, wherein the mapping is performed by a stored procedure.